Statistical Methods for Complex Disease Analysis

Angela Paige Presson
Ph.D., 2007
Advisor: Kenneth Lange

Now that many large-effect disease genes have been characterized, geneticists are challenged by the small-effect variants underlying complex diseases. While traditional linkage and association approaches have demonstrated some success, many studies have failed due to 1) poorly defined phenotypes, 2) inadequate marker spacing, and/or 3) too few samples. Here we present methodology for facilitating and improving complex disease analysis. Chapter 1 describes statistical methods and a software implementation for merging microsatellite marker genotype data. A Bayesian model finds the optimal alignment between data sets, for each marker separately, by matching allele frequencies while preserving their base-pair size order. Chapter 2 describes MicroMerge extensions that improve the yield of accurately aligned alleles and allow a merged file format that can be analyzed by most statistical genetics software.

Traditional genetic analysis methods were designed to localize a single gene and are less effective at simultaneously identifying multiple interacting genes. Systems biology studies interactions among various biological components such as DNA variants, mRNA transcripts, protein and phenotype data. These relationships are quantified by correlation and other network measures. In Chapter 3 we describe a systems biology analysis of microarray, genetic marker and phenotype data for identifying a chronic fatigue-related genetic pathway. We clustered genes with similar expression patterns and related these clusters or 'modules' to chronic fatigue severity. The module most associated with chronic fatigue severity was then correlated with the genetic marker data to identify genes that were related to a candidate locus for chronic fatigue syndrome. Our novel gene screening strategy identified 18 genes, but FOXN1, PRDX3 and SUCLA2 were particularly promising candidates due to their known involvement in the neurological and immune systems.

The final chapter (Chapter 4) describes future directions for the MicroMerge and network analysis pro jects. We present two possible MicroMerge extensions and a Bayesian model for improving weighted gene co-expression network design by integrating gene expression data with published gene interaction data.