New Database Maps Human Genes With ACTG Rules

Kyushu University

Fukuoka, Japan—Whether you turn red when drinking alcohol, dislike certain smells, or metabolize drugs differently from others, the explanation often lies in your DNA, or more precisely, your gene types.

People share the same genes but not the exact same gene types. These types are unique combinations of multiple DNA sequence differences that together shape our biological traits. Researchers have long investigated these genetic variations, but traditional tools analyze only 150-300 bases at a time, providing isolated "dots" of information. Advances in long-read sequencing, which can read tens of thousands of bases at once, are now connecting these dots into "lines," showing how variations work together as functional gene types.

Yet without a standard naming system, researchers remain stuck describing each variant in fragmented and redundant ways.

"It is like explaining a cup by only listing the shape of its handle, its color, or other separate features. It creates barriers to cross-study comparison and slows translation into healthcare," says Professor Masao Nagasaki of Kyushu University's Medical Institute of Bioregulation . "For example, fields like transplant matching or drug metabolism have their own naming schemes, but none are widely adopted."

To address this gap, Nagasaki's team introduced the ACTG hierarchical nomenclature and built a global database called the Joint Open Genome and Omics Platform 1.0 (JoGo 1.0) —a project spanning nearly five years including data acquisition, with about two and a half years devoted to constructing the database itself. The work was published on November 29 in Nucleic Acids Research and was selected as one of the journal's Breakthrough Articles.

Inspired by the four fundamental DNA bases, the ACTG naming system organizes human gene types into four progressively expanding levels: A for the amino acid sequence, C for the coding sequence, T for the transcript level covering untranslated regions, and G for the complete gene body including introns.

"One key feature is that we rank gene types based on global frequency," Nagasaki explains.

For example, the most common variant of the gene Aldehyde Dehydrogenase 2 (ALDH2)—the key enzyme that breaks down acetaldehyde—is designated as ALDH2:a1c1t1g1. A variant with reduced enzymatic activity, often found in East Asian populations and responsible for the flushed red face people experience when consuming alcohol, is categorized in the system as ALDH2:a2. This variant represents a change at the amino acid level. The numbering system indicates global frequency: a lower number signals a more common variant, while a higher number points to greater rarity, and may be associated with a higher risk for certain diseases.

The database draws on DNA data from 258 genomes sampled across five continents—150 sourced from public resources and 108 newly sequenced from cell samples contributed by volunteers in the 1000 Genomes Project.

JoGo 1.0 offers both an interactive online viewer and a privacy-preserving local viewer, enabling secure integration of sensitive datasets.

Fittingly, "JoGo" means "funnel" in Japanese, reflecting the database's role in compressing massive genomic information into meaningful, usable knowledge. It catalogs 4.7 million gene types (haplotypes) across more than 19,000 genes, and can link each gene type to public resources such as ClinVar , the GWAS Catalog , and GTEx . This allows researchers to interpret clinical variants, trait associations, and tissue-specific gene expression. Moreover, with data representing all five inhabited continents, JoGo 1.0's visualizations can highlight geographically distinct patterns, aiding population-specific genetic screening and informing drug development.

Nagasaki and his team are continuously expanding the database, increasing both sample size and population diversity, and expect to release JoGo 2.0 within two years. As more genomes are added, the frequency-based numbering will be refined to better reflect global patterns.

"Having consistent names for whole genes means we can finally speak a common language," says Nagasaki. "Just as there is active research and discussion around blood types today, I hope this new nomenclature will lead to a deeper understanding of, and public dialogue around, human gene types."

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.