One of nature’s most ingenious inventions, the genetic code, unlocked a vast new world of molecular machinery that allowed the first independently living cells to emerge from the complex chemistry that existed on Earth before the origin of life. But nearly 4 billion years later, it remains deeply mysterious how nature learned to use symbolic coding relationships to interpret genes, especially those that now enforce the coding rules inside every living cell. That “chicken-and-egg-question” – how do you get a code when you need one to make one? – is the quintessential challenge posed by the origin of life on Earth. How did random chemicals wind up embedding functionality into genes? Answers to that question reside in evolutionary molecular biology.
The Alfred P. Sloan Foundation has awarded Charlie Carter, PhD, a professor in the UNC Department of Biochemistry and Biophysics at the UNC School of Medicine, a $1.5-million grant to try to answer that vexing question in collaboration with Peter Wills, PhD, at the University of Auckland, New Zealand, and Milena Popovic, PhD, and Mark Ditzler, PhD in the Center for the Emergence of Life at NASA Ames Research Center.
“Genetic coding” is the term used to describe how the four bases of DNA – the A, C, G, and Ts -are strung together so that cellular machinery, especially the ribosome, can interpret genetic instructions and convert sequences of nucleotide bases in genes into sequences of amino acids in proteins. In the genetic coding table, each block of three nucleotides is a triplet that represents a single amino acid. Each possible combination of three bases chosen from the A,C,G,T chemical alphabet forms one of 64 unique “words”. Blueprints for living things are written in an organism’s genes using these coded words.
The sublime mystery in genetic coding is that twenty of a cell’s genes, these “sentences” composed from these base-triplet words, are actually blueprints for the twenty “nanomachines” – one for each amino acid – that convert those base-triplet words into the twenty-letter alphabet of proteins that make up the rest of life’s machinery. This conversion from a 4-base alphabet into a 20-amino acid alphabet is called translation.
The nanomachines that do the translating, called aminoacyl-tRNA synthetases (aaRS), add a specific amino acid protein letter to adaptor molecules – transfer RNAs or tRNAs – that also contain one of the 64 words of the genetic language. Each word has a unique partner among the other 63 words. The essence of molecular biology, worked out in the 1960s, is that the cell uses the same “base pairing” that also maintains accurate gene copying to decide which tRNA adaptors match successive words in a gene. Transfer RNAs coupled to amino acids can thus read any genetic sequence, assuring that the amino acids at the other end of the tRNAs assemble together to make the correct sentence in the protein language. That symbolic conversion represents perhaps the very first time that nature created digital computation and used it to reshape information.
Genes have family trees much like our own. And, just as all our family trees must eventually lead back to a prototypical pair of parents, sequences of the 20 aaRS probably all trace to a complementary pair of ancestral aaRS genes that differentiated the letters of a simple, two-letter alphabet. Carter and Wills will try to characterize that ancestral alphabet and establish how it grew to the twenty-letter coding alphabet by adding a few letters at a time as the aaRS families diversified.
Carter has previously demonstrated how to test reconstructed aaRS genes experimentally. The UNC and NASA teams will extend that work by physically reconstructing ancestral aaRS and cognate tRNAs and verifying how they work in the lab.