Study of the Transcription Regulation in Saccharomyces Cerevisiae

Tianwei Yu
Ph.D., 2005
Advisor: Ker-Chau Li

We develop statistical methods to study the eukaryotic transcription regulation system, using Saccharomyces cerevisiae as the model organism. We study the system at two levels: the biological process level and the gene level.
To shed light on how various parts of the interlocked biological processes are coordinated at the transcription level, there is a need to study the between-unit expressional relationship directly. We approach this issue by constructing an index of communication function to convey the global pattern of coexpression between genes from one process and genes from the entire genome. Processes with similar signatures are then identified and projected to a process-to-process association graph. This top-down method allows for detailed gene-level analysis between linked processes to follow up. Using the cell-cycle gene expression profiles for Saccharomyces cerevisiae, we report well-organized networks of biological processes that would be difficult to find otherwise. Using another dataset, we report a sharply different network structure featuring cellular responses under environmental stress.
On the gene level, transcription of a gene is controlled individually by transcription factors (TF's). TF-gene relationships constitute a complex network. Microarray gene expression and cross-linking chromatin immunoprecipitation (ChIP) data contain voluminous information that can help the identification of transcription regulatory networks at the full genome scale. Such high throughput data are noisy however. In contrast, scattered biomedical literatures contain evidenced transcription factor (TF) – target gene binding relationships that have been elucidated with great details at the molecular level. How to build a network model that incorporates this valuable source of information is a challenging question. Here we present a modified factor analysis approach to address this issue. The algorithm starts with a high-confidence set of evidenced TF-gene linkages. It iterates between the network configuration estimation step and the connection strength estimation step, using high throughput data. We apply our method to obtain two comprehensive regulatory networks for Saccharomyces cerevisiae, one under the normal growth condition and the other under the environmental stress condition.