Genomewide Motif Identification Using a Dictionary Model
Chiara Sabatti, Kenneth Lange
This paper surveys and extends models and algorithms for identifying binding sites in non-coding regions of DNA. These sites control the transcription of genes into messenger RNA in preparation for translation into proteins. We summarize the underlying biology, review three different models for binding site identification, and present a unified model that borrows from the previous models and integrates their main features. We then describe maximum likelihood and maximum a posteriori algorithms for fitting the unified model to data. Finally, we conclude with a prospectus of future data analysis and theoretical research.
2002-09-01