Gene Selection Methods for Single-cell Sequencing Data Analysis

Kexin Li
PhD, 2023
Li, Jingyi
Since the advent of single-cell RNA sequencing (scRNA-seq) technologies around 15 years ago, they have become a powerful tool to characterize cell-to-cell heterogeneity within a cell population in various biological systems, and have revolutionized transcriptomic studies. A typical scRNA-seq dataset contains thousands to tens of thousands of genes; however, a subset of genes are usually sufficient for representing the underlying biological variations of cells that are aligned with researchers’ various interest. The sufficiency can be explained by three reasons: (1) highlighting and enhancing biological signals, (2) improving the interpretability of analysis results, and (3) reducing the number of genes to save computational or human resources. Hence, a number of gene selection methods have been performed in various tasks, for instance, informative gene selection for cell clustering and post-clustering differentially expressed gene identification for cell type annotation. However, existing efforts have not fully addressed the problems: among the genes selected by the existing methods, many are irrelevant, redundant, or insignificant. Gene selection for certain single-cell analysis tasks with biological meaningful interpretation and statistical rigor remains challenging. This dissertation aims to address them in two projects.
2023