One of the most common mistakes when speaking in terms of genetics is to confuse “ genetic code ” with “ genetic sequence ”. There are many books, series or movies in which genetics is part of the central plot, and even the media and scientific dissemination at all levels that confuse both terms. However, the genetic sequence is very different from the genetic code.
“In the late 21st century, a deadly virus seeps into the genetic code
of human beings» – Synopsis in Spanish of “Genetic code” by Nick Sagan.
The genetic sequence: the message
In genetics, we understand by genetic sequence the series of nucleotides that are concatenated one after another forming the DNA chain. There are four possible nucleotides that make up this sequence, each with a different nitrogenous base: adenine (A), cytosine (C), guanine (G), and thymine (T). These four nucleotides function in the chain as, let us use an analogy, letters that are placed one after another forming long texts.
During transcription, fragments of the DNA sequence are transformed into messenger RNA, where thymine is replaced by uracil (U) . Then, the nucleotides are grouped three by three , in what are called codons , which in translation would be associated with amino acids.
There are a total of 64 possible combinations of letters, and only 20 amino acids that are part of proteins, so several combinations can give rise to the same amino acid. There are also certain combinations that correspond to the beginning or the end of the reading. These codons would be, in the analogy, the words ; units with their own meaning, but that by themselves do not contribute much.
For a text to make sense we need the words to be arranged one after another, forming sentences. Those sets of codons that end up forming coherent structures are called genes , and when they are translated, the amino acids that were associated with each codon are concatenated with the rest in the order dictated by the genetic sequence, forming proteins.
In every good book, the sentences are grouped into chapters , which in turn end up building the work. We call the ‘genetic’ book the genome, and, in our simile, its chapters would be the chromosomes.
“A book that includes the entire genetic code of
a person has about 262,000 pages» — La Vanguardia
For example, the sequence ATG TCA AGC TCT TCC is a genetic sequence that makes up five RNA codons: AUG UCA AGC UCU UCC, each of which is associated with an amino acid: one methionine and four serines, exactly .
As we said, several combinations of letters can give the same amino acid.
Those are the first five amino acids that, together with 800 more, make up a protein called ACE2 , one of the receptors that human beings have in our cells, where SARS-CoV-2 adheres.
“Four newspaper pages have the same number of letters
than the genetic code of the new coronavirus: 30,000” – El País
The genetic code: language
If, in this analogy, the genetic sequence is the achievement of letters that form words and, finally, the complete book that we call the genome is the one that gives the instructions to form a new living being, what is the genetic code?
Every book is written in some language. This article you are reading is written in Spanish; the words that make up this text are written in such a way that they evoke concepts; the correct syntax of the sentences allows their understanding. But it is only understood if it is read in Spanish . If the vocabulary, grammar and syntax of Galician were used — a language that shares all its letters with Spanish, including the “ñ” — the message would not be understandable.
Likewise, the genetic code is what allows the AUG sequence to correspond to methionine , and the UCA AGC UCU and UCC sequences to all associate with serine. The genetic code represents that conversion table from RNA codons to amino acids.
Thus, if the genetic sequence is analogous to the attainment of letters that make up the book, then the genetic code is analogous to the language in which that book is written . In the same way that two different books, with different messages, such as The Hobbit and Jurassic Park, can be written in the same language, two different living beings, with different genetic sequences, can have the same genetic code.
Of course, this is all an analogy. Neither the DNA sequence is letters in a text nor is the genetic code a true language; everything at this level is biochemistry, and the use of communication theory is just a tool that, through analogies, facilitates its understanding.
“They took the genetic code of the dinosaurs and
they mixed it with the frogs” – Dr. Alan Grant, Jurassic Park.
The characteristics that are usually mentioned when talking about the genetic code is that it is degenerate, unequivocal and universal. It is degenerate, because there are more combinations of codons than amino acids, and therefore, two different sequences can give rise to the same amino acid , as we have seen. And it is unequivocal , because there is no case in which the same sequence can give rise to several different amino acids under any circumstances.
But the third characteristic is its universality . That is, the genetic code is the same for all species. It is not possible to obtain the genetic codes of different species, because they all have the same one.
“So far, scientists have only sequenced the genetic code of approximately 15,000 species of living things, most of them microbes” – National Geographic
The only known exception is found in bacteria, in whose genetic code there are slight changes compared to that of eukaryotes. Changes that we also find in the mitochondria, due to their evolutionary origin. For example, while for all eukaryotic organisms the AUA sequence produces isoleucine, in mitochondria that sequence corresponds to methionine. But human beings have the same genetic code as scorpions, lettuces and mosses.
Andrej Elzanowski et al. 2019. The Genetic Codes. NCBI. CCSD Database. 2022. Report for CCDS14169.1 [ACE2]. NCBI.