Statistical Strategies in eQTL Studies

Wei Sun
Ph.D., 2007
Advisor: Ker-Chau Li

Gene expression quantitative loci (eQTL) studies have become a popular research area recently [RK06, PGH06]. In this dissertation, we explore several statistical strategies to extend our knowledge of the genetic basis of gene expression. One interesting finding in eQTL studies is that multiple gene expression traits can be linked to a small segment of DNA sequence, named eQTL hot spot. However no systematic effort has been reported to identify the molecular mechanism behind the eQTL hot spots. We consider the scenario that the propagation of genetic perturbation is mediated by the transcription factor activity, and develop a statistical procedure to detect the eQTL modules (eQTL hot spots together with linked genes) that are compatible with this scenario. Application of our method in a yeast eQTL data is reported. Most eQTL studies only carry out 1D-trait mapping, i.e., mapping the eQTL for gene expression traits one by one. A major shortcoming of 1D-trait mapping is that it ignores the trait-trait interaction completely. To overcome this limitation, we study the expressions of a pair of genes and treat the variation in their co-expression pattern as a two dimensional quantitative trait (2D-trait).

We develop a statistical method to find gene pairs, whose co-expression patterns, including both signs and strengths, are mediated by genetic variations and map these 2D-traits to the corresponding genetic loci. We report several applications by combining 1D-trait mapping with 2D-trait mapping, including the contribution of genetic variations to the perturbations in the regulatory mechanisms of yeast metabolic pathways. The linkage of a complex trait (e.g., body fat) or a gene expression trait to a genetic locus may depend on cellular conditions. If the entire set of samples in a study consists of samples corresponding to different cellular conditions that affect the directions and/or strengths of linkage signals, the overall linkage signal may be weaken, or even cancelled out. We use genome-wide gene expression profiles as surrogates of cellular condition indicators and employ a statistical method to identify dynamic linkages of complex traits or gene expression traits conditioning on gene expression profiles. Applications in a mouse eQTL data are reported.