Massive Project to Understand Our Genes Reveals Secrets of RNA

There are tens of thousands of proteins in the human body. A single lab at UConn Health spent five years studying 356 of them, and they’ve published the results today in three papers in Nature as part of a massive, ongoing collaboration to identify what, exactly, every single protein and bit of DNA and RNA in the human genome does.

The Encyclopedia of DNA Elements (ENCODE) Project is a worldwide effort to understand how the human genome functions. The genome is a general term for the instruction manual for building and operating our bodies. It includes DNA, the string of chemical letters that spell out our genes, as well as RNA, a related group of chemical letters that translate genes into proteins and carry messages around our cells.

Brenton Graveley Ph.D. is the chair of the Department of Genetics and Genomic Sciences at UConn Health
Brenton Graveley Ph.D. is the chair of the Department of Genetics and Genomic Sciences at UConn Health on March 22, 2019. (Tina Encarnacion/UConn Health photo)

ENCODE is an ambitious project. There are millions of individual elements involved in the human genome. For example, one of the functions of a gene is to tell the body how to make one or more proteins. But we don’t even know exactly how many proteins the body makes. Estimates range from 80,000 to 400,000. And at least 1,500 proteins bind to RNA for various reasons we are only just beginning to understand.

For Brent Graveley, who chairs the Department of Genetics and Genome Sciences at UConn Health, that’s part of the intrigue. There’s just so much to know. One of the three papers he authored, for example, looked at 356 of the proteins that bind to RNA. These proteins control the splicing, translation, degradation, and localization of the bound RNAs in a variety of ways. It took five labs, including his own, five years to figure out just this small piece of the puzzle. Other researchers in other labs working on other projects performed approximately 6,000 experiments just for this phase of the project.

“This is the largest study to date on human RNA binding proteins,” Graveley says. “Over 25% of the proteins we have studied have previously had no known function and our studies now provide clues as to what these proteins are doing. In addition, this resource now makes it possible for researchers in the community to easily identify all of the proteins or RNAs that interact with the RNAs or proteins, respectively, they are interested in, allowing them to immediately start developing hypotheses they can test.”

The project’s latest results, including Graveley’s lab’s work, were published in three papers in Nature on July 29, accompanied by several additional studies published in other major journals. ENCODE is funded by the National Human Genome Research Institute, part of the National Institutes of Health.

“A key challenge in ENCODE is that different genes and functional regions are active in different cell types,” says Elise Feingold, Ph.D., scientific advisor for strategic implementation in the Division of Genome Sciences at NHGRI and a lead on ENCODE for the institute. “This means that we need to test a large and diverse number of biological samples to work towards a catalog of candidate functional elements in the genome.”

The ultimate goal is for the data sets developed in the ENCODE project to be used by other researchers as reference sets when they’re designing their own research projects. The ENCODE Project itself has benefited from, and built upon, decades of prior research on gene regulation performed by independent researchers around the world. ENCODE researchers have created a community resource, ensuring that the project’s data is accessible to any researcher for their studies. These efforts in open science have resulted in over 2,000 publications from non-ENCODE researchers who used data generated by the ENCODE Project.

“The data generated in ENCODE 3 dramatically increase our understanding of the human genome,” says Graveley. “The project has added tremendous resolution and clarity for previous data types, such as DNA-binding proteins and chromatin marks, and new data types, such as long-range DNA interactions and protein-RNA interactions. Although we have made great progress, there are still over 1,000 more RNA binding proteins to characterize to complete the catalog.”

/Public Release. The material in this public release comes from the originating organization and may be of a point-in-time nature, edited for clarity, style and length. View in full here.