Interpreting Human Genome’s Instruction Manual

An artistic representation of gene regulating elements, which allow cells with the same genetic code to differentiate into many different tissues and play many varied roles in the body. (Credit: Ella Maru Studio)

A 17-year research project has generated a detailed atlas of the genome that reveals the location of hundreds of thousands of potential regulatory regions – a resource that will help all human biology research moving forward.

Of the three billion base pairs in the human genome, only 2% code for the proteins that build and maintain our bodies. The other 98% harbors, among other things, potential regulatory regions – sequences that give cells the instructions and tools needed to turn protein recipes into an astonishingly complex organism. Yet despite their importance and prevalence, non-coding regions have been studied much less than gene-coding sequences, in part because it is more difficult to do so.

The Encyclopedia of DNA Elements (ENCODE) collaboration was launched by the National Human Genome Research Institute with the goal of developing the tools and expertise needed to shed light on our genome’s mysterious majority. Now in its final year, ENCODE has made huge advances thanks to the combined scientific and technological prowess of several hundred researchers at dozens of institutions.

“We’ve sequenced the human genome and we largely know where genes are. But when you get outside genes, mapping the function of genomic ‘dark matter’ is much more daunting. It’s a big step forward for us to know how to find the areas within the 98% that are functionally important,” said Len Pennacchio, a senior scientist at Lawrence Berkeley National Laboratory (Berkeley Lab) and co-author on 4 of the 15 new ENCODE papers published this week as part of a special collection in Nature. In addition to their original research, Pennacchio and his Berkeley Lab colleagues also provided technical expertise and materials to other ENCODE consortium teams.

According to Pennacchio, the project’s recent advances will be particularly useful for scientists studying diseases. When trying to determine the underlying causes of a condition, researchers search for genetic variants carried by affected individuals. Sometimes, he said, they find associations with sequences within genes, but often the analyses will pinpoint an area that’s far away from any protein-coding sequence, and it isn’t readily apparent what that DNA does. Is it important in the heart, or the stomach? Is it important all the time or just at certain phases of development?

“Our datasets give scientists clues as to when and where that sequence functions, and which gene or genes it affects. It gives you an immediate path to follow to learn more, where previously we’d have few hints,” he said.

From theory to reality

In the past phases of ENCODE, researchers were focused on identifying all DNA sequences that regulate gene expression, such as promoters and enhancers, and establishing how different regions of our chromosomes are modified and stored (i.e., wrapped around proteins called histones or bound with small tagging molecules). This information reveals a great deal about how cells can express or silence genes differently depending on timing and where they are located in the body. The earlier work was mostly performed on DNA extracted from human cell lines.

An illustration of DNA modifying elements

An illustration of DNA modifying elements, including histones and chemical tags.

“Thanks to ENCODE 2, we had a pretty good map of how DNA is modified along the genome, but what was missing really was the legend for that map,” explained Axel Visel, also a Berkeley Lab senior scientist. Visel and Diane Dickel, a research scientist, are co-authors with Pennacchio on the new papers, and all three run the Mammalian Functional Genomics Laboratory within Berkeley Lab’s Biosciences Area. “ENCODE phase 3 has been all about understanding what these different modifying marks we found in cell lines really mean in terms of a real organism,” Visel added.

For the phase 3 experiments, the Berkeley Lab group, along with numerous other ENCODE consortium teams began applying their analyses to mouse tissues, as the mouse genome is very similar to ours and many of the DNA modifications and on-off switches for gene expression are known to be the same.

The Berkeley Lab team, which has been involved in the project for 12 years, played an especially significant role in ENCODE 3. They are renowned leaders in the use of ChIP-seq, a technique that allows scientists to locate transcription factors and modified proteins on chromatin (the densely packed state that DNA exists in when not activated for transcription or replication), and then to analyze how these molecules are interacting with the sequences. They are also known for their expertise in transgenic assays, a technique used to test if potential gene switches actually function as predicted.

Working closely with Bing Ren at the Ludwig Institute for Cancer Research, the team used ChIP-seq to study the changing landscape of chromatin in embryonic mice and then carried out hundreds of transgenic assays to validate these findings. After thousands of experiments, they generated a dataset covering diverse body tissues at eight developmental stages, significantly expanding the scientific community’s knowledge of DNA dynamics during mouse development and creating a resource for biomedical researchers seeking to learn more about human development.

Their atlas, along with nearly 6,000 other datasets on mouse and human DNA regulating elements generated by collaborating research teams, is freely accessible on ENCODE’s new online portal.

“Over the years, we’ve worked extensively with the other groups that were involved in ENCODE and built great complementary relationships,” said Dickel. “This is the kind of progress that comes from good collaborations, rather than competition.”

For the last leg of the project (ENCODE 4), which is in its final year, participating scientists are using genetically engineered mice to verify and expand upon the discoveries made from studying isolated tissues.

The ENCODE project is funded by grants from the National Institutes of Health. The other Berkeley Lab scientists who contributed to this work were: Iros Barozzi, Veena Afzal, Jennifer Akiyama, Ingrid Plajzer-Frick, Catherine Novak, Momoe Kato, Tyler Garvin, Quan Pham, Anne Harrington, Brandon Mannion, Elizabeth Lee, and Yoko Fukuda-Yuzawa.

/Public Release. The material in this public release comes from the originating organization and may be of a point-in-time nature, edited for clarity, style and length. View in full here.