The hidden Markov model (HMM), a statistical model widely applied in machine learning, has proven effective in addressing various problems in bioinformatics. Once primarily regarded as a mathematical framework for modeling stochastic processes, HMMs have become indispensable tools for solving a wide range of biological sequence problems, from gene prediction to protein structure analysis.
In a recent review published in Genes & Diseases, researchers from Harbin Medical University systematically introduce the theoretical foundations of HMMs, including the three canonical problems—evaluation, decoding, and learning, along with the algorithms most commonly used to address them, such as the Viterbi and Baum-Welch algorithms.
This review emphasizes the wide-ranging applications of HMMs in bioinformatics, with a focus on five major domains: (i) Transmembrane protein prediction – Tools like HMMTOP apply HMM-based approaches to resolve protein topology, providing critical insights for drug discovery and structural biology. (ii) Gene finding – Programs such as GENSCAN and AUGUSTUS utilize generalized HMMs to predict exon–intron boundaries, facilitating accurate genome annotation across species. (iii) Multiple sequence alignment – Profile HMMs underpin widely used resources such as Pfam and HMMER, enabling homology detection, protein family classification, and functional annotation. (iv) CpG island prediction – HMMs offer statistically grounded methods to identify CpG-rich regions involved in epigenetic regulation and disease. (v) Copy number variation (CNV) detection – Algorithms including PennCNV and QuantiSNP rely on HMMs to detect CNVs with high sensitivity, providing insights into genetic diversity and disease susceptibility.
Furthermore, the review critically discusses the strengths and limitations of HMMs, noting their versatility, statistical rigor, and interpretability, while acknowledging challenges such as computational demands and assumptions of linearity. The authors highlight that integrating HMMs with next-generation sequencing, multi-omics, and advanced machine learning approaches will be essential for extending their relevance in modern computational biology.
By consolidating both theoretical insights and practical applications, this review positions HMMs as a cornerstone of bioinformatics research. As biological data continue to expand in scope and complexity, HMMs are expected to remain central to advancing genome annotation, functional genomics, and precision medicine.
Reference
Title of Original Paper: The hidden Markov model and its applications in bioinformatics analysis
Journal: Genes & Diseases
Genes & Diseases is a journal for molecular and translational medicine. The journal primarily focuses on publishing investigations on the molecular bases and experimental therapeutics of human diseases. Publication formats include full length research article, review article, short communication, correspondence, perspectives, commentary, views on news, and research watch.
DOI: https://doi.org/10.1016/j.gendis.2025.101729