Detecting Combinatorial DNA-Binding Patterns by the K-function and its Generalizations

Maria Cha
Ph.D., 2015
Advisor: Qing Zhou
Recent development in ChIP-Seq technology has generated binding data for many transcription factors (TFs) in various cell types and cellular conditions. This opens great opportunities for studying combinatorial binding patterns among a set of TFs active in a particular cellular condition, which is a key component for understanding the interaction between TFs in gene regulation. As a first step to the identification of combinatorial binding patterns, we develop statistical methods to detect clustering and ordering patterns among binding sites (BSs) of a pair of TFs. Testing procedures based on Ripley’s K-function and its generalizations are developed to identify binding patterns from large collections of BSs in ChIP-Seq data. We have applied our methods to the ChIP-Seq data of 91 pairs of TFs in mouse embryonic stem cells (ESCs) and 210 pairs in human ESCs. Our methods have detected clustering binding patterns between most TF pairs, which is consistent with the findings in the literature, and have identified significant ordering preferences, relative to the direction of target gene transcription, among the BSs of seven TFs in mouse ESCs . More interestingly, our results demonstrate that the identified clustering and ordering binding patterns between TFs are associated with the expression of the target genes. Lastly we study the effects of binding of multiple TFs and histone modifications (HMs) on the clustering patterns of the TF pairs in mouse and human ESCs using multiple linear regression analysis. The results suggest that the role of TFs or HMs that are significant in regression models are consistent to the clustering patterns of different pairs of TFs. These findings provide new insights into the co-regulation among TFs and HMs.