Chromatin accessibility, a fundamental property of DNA that plays a critical role in gene regulation and cell identity, refers to the degree that nuclear macromolecules can access and interact with DNA. With the rapid advances in single-cell chromatin accessibility sequencing (scCAS) technologies, the importance of cell type annotation in scCAS data is on the rise due to its potential to capture the chromatin regulatory landscape that controls gene transcription in each cell type. However, there are still significant limitations of existing automatic annotation methods, including low annotation accuracy, failure to incorporate the information of reference data, and inability to identify novel cell types.
Recently, Quantitative Biology published an approach entitled "Accurate cell type annotation for single-cell chromatin accessibility data via contrastive learning and reference guidance", which is a reference-guided automatic annotation method based on the contrastive learning framework, and capable of effectively identifying novel cell types. With extensive experiments on multiple scCAS datasets, RAINBOW demonstrates its superiority over state-of-the-art methods in known and novel cell type annotation.
RAINBOW constructs an automatic cell type annotation model based on the contrastive learning framework (Figure 1), which focuses on learning common features among cells of the same type and distinguishing differences between non-similar cells, thereby enhancing the heterogeneity of different cell types. Additionally, RAINBOW incorporates information from external reference data as prior knowledge. Furthermore, through unsupervised clustering of unlabeled data and selecting the clusters with higher average entropy, RAINBOW can identify novel cell types effectively. Comprehensive benchmarking experiments show that RAINBOW excels beyond current leading methods in annotating cell types. Furthermore, RAINBOW holds promise in uncovering new biological processes and functions of cells.