Relationship between genome size and gene number in eukaryotes

The size of the genome and the complexity of living beings - Revista Mètode

relationship between genome size and gene number in eukaryotes

Distinct relationships between logtransformed protein-coding gene number (Y ′) versus logtransformed genome size (X′, genome size. The inability to successfully estimate the number of genes in eukaryotes based on The latter class shows no clear correlation between genome size and gene . Genome size is the total amount of DNA contained within The genome sizes of thousands of eukaryotes have been to the discrepancy between genome size and gene number, this term remains in common usage. gene number forms a linear correlation with.

The figure shows data for a variety of bacteria and archaea, with the slope of the data line confirming the simple rule of thumb relating genome size and gene number. Lynch, The Origins of Genome Architecture.

The size of the genome and the complexity of living beings

Many bacteria have several thousand genes. This gene content is proportional to the genome size and protein size as shown below. Interestingly, eukaryotic genomes, which are often a thousand times or more larger than those in prokaryotes, contain only an order of magnitude more genes than their prokaryotic counterparts.

The inability to successfully estimate the number of genes in eukaryotes based on knowledge of the gene content of prokaryotes was one of the unexpected twists of modern biology. One finds that this crude rule of thumb works surprisingly well for many bacteria and archaea but fails miserably for multicellular organisms. The simplest estimate of the number of genes in a genome unfolds by assuming that the entirety of the genome codes for genes of interest.

For bacterial genomes, this strategy works surprisingly well as can be seen in table 1 and Figure 1. For example, when applied to the E. On the other hand, this strategy fails spectacularly when we apply it to eukaryotic genomes, resulting for example in the estimate that the number of genes in the human genome should be 3,, a gross overestimate. The different sequence components making up the human genome.

Most transposable elements are genomic remnants, which are currently defunct. Referring back to Figure 2, we see that, in general, eukaryotes have larger genomes than prokaryotes, except for some endosymbiont or parasitic green algae, which have very small genomes.

Specifically, the smallest eukaryotic genome ever sequenced is that of Guillardia theta, a symbiont red algae, of only 0. We can also see in the figure that there is a wide range of sizes, much greater than that of prokaryotes, more than 80,fold larger, from organisms such as yeast 1.

But is there, as in bacteria, a relationship between genome size and complexity of the organism? In Figure 2 we have represented the range of C-value in several representative groups of eukaryotic organisms.

Genome size and number of genes

As we can observe, unicellular protists such as amoebae show the greatest variation in C-values Furthermore, the large variation in genome sizes between eukaryotic species does not seem to have a relationship with either the complexity of the organism or the number of genes they contain.

For example, amoebae, which have the largest genomes, have times more DNA than humans 3, Mb and it is clear that an amoeba cannot be more complex than a human.

Moreover, it would be expected that mammals, more complex organisms, present larger genomes.

relationship between genome size and gene number in eukaryotes

However, many other organisms, such as fish, amphibians or plants, have much larger genomes. Even when we compare the sizes between organisms that appear similar in terms of complexity, there are also wide differences in their C-values.

To give some examples, flies and locusts, onions and lilies, etc. Amphibians as a group have variations of up to 91 times and it is hard to believe that this may reflect variations of nearly times the number of genes necessary to give rise to the corresponding amphibians, or that onions need times more DNA than rice.

Figure 3 shows some living beings with size proportional to the size of their genome and needs no further explanation.

relationship between genome size and gene number in eukaryotes

Genome size in some living beings. The height of the drawings is proportional to the size of their genome. The mismatch between the C-values and the presumed amount of genetic information contained within the genomes was called C-value paradox. Since we cannot assume that a species possesses less DNA than the quantity required to specify its vital functions, we have to explain why many species contain this amount of excess DNA.

That is, are the differences in genome sizes due to gene or non-gene DNA? We have known since the late 60s that the eukaryotic genome is composed of a large amount of repetitive DNA. Moreover, since the late 70s we have known that genes are interrupted by non-coding sequences, introns, which must be removed before the ribosome synthesizes protein.

There was a problem providing the content you requested

We are talking in both cases about a seemingly superfluous DNA, which contributes to the wide variation in C-values and therefore explains the apparent paradox. The size and number of introns vary widely along the evolutionary scale, mammals being the ones with the highest number and larger size. Repetitive DNA also varies between organisms.

relationship between genome size and gene number in eukaryotes

Traditionally this DNA is classified as: Number of genes and complexity of the organism As sequences of whole genomes are completed, we will know with more or less accuracy the number of genes derived from these sequences, since what we had so far were indirect estimates. However, some data is proving to be surprising because, in some cases, there appears to be a clear correlation between the number of genes and the complexity of the organism.

The nematode worm C. Man has only twice as many genes as C. We are also beginning to understand these data. That is, from the same DNA sequence, they can obtain more than one protein. It will be some time before we can determine the number of proteins that an organism is able to synthesize. But this would be the subject of another paper.