New Database Boosts Single-Cell Chromatin Research

Higher Education Press

Single-cell analyses have emerged as powerful tools for studying cellular heterogeneity and gene regulation. Single-cell chromatin accessibility sequencing (scCAS) is a key technology that enables the analysis of chromatin accessibility at the resolution of individual cells. However, there are three main challenges in the use of scCAS data: (1) Publicly available data in public research generated from diverse species, tissues, and experimental conditions are not systematically collected; (2) scCAS data with cell type, tissue, and other labels can be used to train machine learning methods for single-cell tasks such as cell type annotations, but such critically important annotated datasets have not been systematically collected; (3) The diversity of data formats across studies complicates efforts toward format standardization.

To solve these problems, a research team led by Shengquan Chen published their new research on 15 November 2025 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.

The team developed scCASdb, a user-friendly and well-annotated scCAS database that standardized datasets in the h5ad format. By systematically collecting 80 well-annotated datasets from diverse species, tissues, and experimental conditions, scCASdb enables diverse single-cell analyses that were previously hindered by the lack of comprehensive collections. Moreover, the adoption of the h5ad format ensures efficient data accessibility and compatibility with both Python-based tools like Scanpy and machine learning models.

All data stored in the database are saved in h5ad formats, which efficiently manage large-scale single-cell data and can be seamlessly utilized with Python-based machine learning methods, enabling researchers to develop computational tools for single-cell analysis.

Each dataset in scCASdb contains three key components: (1) a cell-by-peak matrix, which records chromatin accessibility information for each single cell, providing a precise description of chromatin accessibility across different genomic regions; (2) cell type labels for cells in cell-by-peak matrix when available, which help researchers identify and classify cell populations, supporting the analysis of cellular heterogeneity; (3) metadata, such as species, genome, organs, diseases, sequencing technologies, and batch labels, which greatly facilitate researchers in diverse single-cell tasks.

Future work can focus on increasing the number of datasets and incorporating additional features to facilitate user access to the data.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.