"Silent DNA: speaking RNA language?"
Alexander E.Vinogradov
Institute of Cytology,Russian Academy of Sciences,Tikhoretsky Ave.4, St Petersburg 194064,Russia
Contact: aevin@mail.cytspb.rssi.ru
The sequence of silent DNA in the human genome (intergenic spacers,
introns and synonymous codon positions of protein-coding genes) was found
here to have the higher thermostability of corresponding RNA/RNA and RNA/DNA
duplexes as compared with randomized sequence. This difference
increased with elevation of GC content. The revealed effect was
not due to correlation of RNA/RNA and RNA/DNA thermostabilities with thermostability
of the DNA/DNA duplex, which, on the contrary, was lower than in the randomized
sequence and lagged behind the elevation of GC content. The same picture
was observed in the genomes of other warm-blooded vertebrates but not in
the lower organisms. This finding
suggests that RNA–RNA and RNA–DNA interactions could be involved
in the putative function of silent DNA.
INTRODUCTION
It is assumed that up to 99% of the human genome does not code for
proteins (IHGSC, 2001; Venter et
al., 2001). The function of this DNA, if any, is unknown. Most human
genes are concentrated in the GC-rich genome core (Bernardi,
2000). The causes of this GC-enrichment are also unknown, but are supposed
to be due to either mutation bias (Francino and Ochman,
1999; IHGSC, 2001), biased gene conversion (Galtier
et al., 2001; Galtier, 2003), selection
for thermostability of DNA helix (Bernardi, 2000)
or physical properties
associated with active transcription (bendability and ability to
undergo B–Z transition) (Vinogradov, 2001,
2003). Here the relative (to randomized sequences) thermostabilities of
DNA/DNA, RNA/RNA and RNA/DNA duplexes are analyzed in different genomic
regions of warm-blooded vertebrates and lower animals.
MATERIALS AND METHODS
The sequences were extracted from GenBank. Genes were checked for
duplicates on the basis of coding sequence (CDS) similarity (>99%). Pseudogenes
were not included in the gene-level analysis, but were taken into account
for the determination of intergenic spacers. The transposable elements
and the level of their divergence (percent of nucleotide substitutions)
from consensus sequences were determined with the RepeatMasker program
(A.F.A. Smit and P. Green,
http://ftp.genome.washington.edu/RM/RepeatMasker.html
). The thermostabilities of each sequence were determined using the dinucleotide
tables for free energy of melting (deltaG) for RNA/RNA (Xia
et al., 1998), RNA/DNA (Sugimoto et al.,
1995) and DNA/DNA (SantaLucia, 1998) duplexes
in a sliding dinucleotide frame (with 1-nt steps), and averaged for each
sequence. (In the case of RNA/DNA duplex, where there is a difference between
strands, the results with the GenBank strand considered as the DNAstrand
are presented. The results with the opposite strand were similar.)
For each non-coding genomic sequence (introns, intergenic spacers
and Alu repeats), 10 randomizations were made, and thermostabilities calculated
for these randomized sequences were averaged (to approximate the mathematical
expectation for randomized sequence). For each protein-coding sequence,
all synonymous codon positions that can be permuted with conservation of
the same GC content of each synonymous position (i.e. only G<->C or
A<->T replacements were allowed to ensure the strictest conditions of
permutation), were randomly permuted 10 times with preservation of the
mean purine content of the total set of synonymous positions, and thermostabilities
calculated for these permuted sequences were averaged. The relative value
of each thermostability was determined for each genomic sequence
as the difference between the value for the genomic sequence itself and
the average value for its 10 randomized or permuted sequences. (The relative
thermostabilities determined as ratios between the values for genomic and
randomized/permuted sequences were also analyzed and showed qualitatively
the same picture.)
RESULTS AND DISCUSSION
In the introns and intergenic spacers of warm-blooded vertebrates, the relative thermostabilities of RNA/RNA and RNA/DNA duplexes were not only generally greater than zero but increased with the elevation of GC content (Fig. 1 and Table 1). The opposite picture was observed for the relative thermostabilities of DNA/DNA duplex. Genomes of the lower animals did not show such regularities (Table 1). In the CDS, the relative thermostabilities of RNA/RNA and RNA/DNA duplexes were also positive in the genomes of warm-blooded vertebrates and increased with elevation of GC content, whereas the relative thermostabilities of DNA/DNA duplex showed the opposite trend (Table 1). Among the lower organisms, the coding sequences of the pufferfish and the fruitfly showed a similar trend but increments in the relative RNA/RNA(DNA) thermostabilities were lower and there was no decrement in the relative thermostability of DNA/DNA duplex (Table 1).
Fig. 1. Regression of the relative thermostabilities of corresponding duplexes on GC content for the human introns.
Generally, there seems to be a trade-off between relative RNA/RNA(DNA) thermostabilities and relative DNA/DNA thermostability caused by intrinsic physical properties of nucleic acids, because they correlated negatively in all genomes studied (data not shown). Therefore, a great increment in the former (as in warm-blooded vertebrates) is to be associated with a decrement in the latter.
In the warm-blooded vertebrates, all the effects were more pronounced in the introns as compared with the intergenic spacers and synonymous positions of coding sequences (Tables 1 and 2). (Hence, these effects can be helpful for gene prediction.)
The Alu repeats, the most common retroposons in the human genome (above 95% of which are already dead) (Smit, 1999; Aleman et al., 2000), showed an increase in the relative RNA/RNA(DNA) thermostabilities both with elevation of GCcontent and (independently) with the divergence from consensus sequences which approximate active ancestor copies (Table 3).
The data obtained suggest that thermostability of the corresponding RNA/RNA(DNA) duplexes can be important for non-coding DNA of warm-blooded vertebrates, especially in the GC-rich genome core. Since the non-coding DNA may accumulate to a considerable degree through the activity of retroposons, it could be supposed that the observed effects originally emerged in the retroposons to facilitate their propagation. However, the thermostabilities of RNA/RNA(DNA) duplexes were lower in the active ancestor copies of Alu retroposons and increased after they were dead (Table 3). Furthermore, the regions of non-coding DNA where no transposable elements were detectable showed the same effect as the total sequences (Tables 1 and 2).
It is well known that most (>95%)RNA transcribed in the mammalian
cells is short-lived and not translated into proteins (Mattick
and Gagen, 2001). Many small regulatory RNAs were discovered recently
and their number is believed to be still underestimated (Eddy,
2001; Pasquinelli, 2002; Wagner
and Flardh, 2002). It was suggested that in the higher eukaryotes,
which turned out to have a surprisingly small number of protein-coding
genes but a large amount of presumably non-coding DNA, the unknown RNA–RNA
and RNA–DNA interactions operate as multitasked regulatory networks (Mattick,
2001; Mattick and Gagen, 2001). In fact,
a new paradigm for molecular and cellular biology was proposed. The data
obtained here support this suggestion. The effects being stronger in the
introns is also in accordance with this hypothesis.
The regulatory networks based on RNA–RNA(DNA) interactions involving
silent DNA may have appeared earlier than in warm-blooded vertebrates,
but thermostabilities of these interactions can become crucial only in
these organisms. This could lead to GC-enrichment of the genome and to
the shaping of its sequence for even higher RNA–RNA(DNA) thermostabilities
than follow from the nucleotide content. The main increase in thermostability
was due to GC-enrichment per se, but the effects revealed here indicate
the thermostabil-ity
of which duplex(es) might have been a leading cause of this enrichment.
The lagging thermostability of DNA/DNA duplex is understandable because
the melting energy of the long DNA duplexes is much higher than the body
temperature of warm-blooded vertebrates (Sugimoto
et al., 1996), and because of the (above mentioned) inverse relation
between the relative RNA/RNA(DNA) and DNA/DNA thermostabilities.
As for putative regulatory RNA–RNA(DNA) interactions, they probably hinge on the short (or incomplete) sequence match, therefore the thermostabilities of these interactions can become important in warm-blooded organisms. An analogy exists in prokaryotes where GC content of the total genome does not correlate with habitat temperature, whereas GC content of the ribosomal RNA does (Galtier and Lobry, 1997; Hurst and Merchant, 2001).
ACKNOWLEDGEMENTS
This work was supported by the Russian Foundation for Basic Research (RFBR).
REFERENCES
Aleman, C., Roy-Engel, A.M., Shaikh, T.H. and Deininger, P.L. (2000) Cis-acting influences on Alu RNA levels. Nucleic Acids Res., 28, 4755–4761.
Bernardi, G. (2000) The compositional evolution of vertebrate genomes. Gene, 259, 31–43.
Eddy, S.R. (2001) Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet., 2, 919–929.
Francino, M.P. and Ochman, H. (1999) Isochores result from mutation not selection. Nature, 400, 30–31.
Galtier, N. (2003) Gene conversion drives GC content evolution in mammalian histones. Trends Genet., 19, 65–68.
Galtier, N. and Lobry, J.R. (1997) Relationships between genomic G +C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J. Mol. Evol., 44, 632–636.
Galtier, N., Piganeau, G., Mouchiroud, D. and Duret, L. (2001) GC content evolution in mammalian genomes, the biased gene conversion hypothesis. Genetics, 159, 907–911.
Hurst, L.D. and Merchant, A.R. (2001) High guanine–cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes. Proc. R. Soc. Lond. B, 268, 493–497.
IHGSC (International Human Genome Sequencing Consortium) (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921.
Mattick, J.S. (2001) Non-coding RNAs: the architects of eukaryotic complexity. EMBO Rep., 2, 986–991.
Mattick, J.S. and Gagen, M.J. (2001) The evolution of controlled multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms. Mol. Biol. Evol., 18, 1611–1630.
Pasquinelli, A.E. (2002) MicroRNAs: deviants no longer. Trends Genet., 18, 171–173.
SantaLucia, J.,Jr. (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl Acad. Sci. USA, 95, 1460–1465.
Smit, A.F. (1999) Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev., 9, 657–663.
Sugimoto, N., Nakano, S., Katoh, M., Matsumura, A., Nakamuta, H., Ohmichi, T., Yoneyama, M. and Sasaki, M. (1995) Thermodynamic parameters to predict stability of RNA/DNA hybrid duplexes. Biochemistry, 34, 11211–11216.
Sugimoto, N., Nakano, S., Yoneyama, M. and Honda, K. (1996) Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Res., 24, 4501–4505.
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A. et al. (2001) The sequence of the human genome. Science, 291, 1304–1351.
Vinogradov, A.E. (2001) Bendable genes of warm-blooded vertebrates. Mol. Biol. Evol., 18, 2195–2200.
Vinogradov, A.E. (2003) DNA helix: the importance of being GC-rich. Nucleic Acids Res., 31, 1838–1844.
Wagner, E.G. and Flardh, K. (2002) Antisense RNAs everywhere? Trends Genet., 18, 223–226.
Xia, T., SantaLucia, J.,Jr., Burkard, M.E., Kierzek,
R., Schroeder, S.J., Jiao, X., Cox, C. and Turner, D.H. (1998) Thermodynamic
parameters for an expanded nearest-neighbor model for formation of RNA
duplexes with Watson–Crick base pairs. Biochemistry, 37, 14719–14735.
1. Iwakiri D, Eizuru Y, Tokunaga M, and Takada K, "Autocrine Growth of Epstein-Barr Virus-Positive Gastric Carcinoma Cells Mediated by an Epstein-Barr Virus-Encoded Small RNA".
2. Santulli-Marotto S, Nair SK, Rusconi C, Sullenger B, and Gilboa E, "Multivalent RNA Aptamers That Inhibit CTLA-4 and Enhance Tumor Immunity".
3. Pan B, Xiong Y, Shi K, and Sundaralingam M, "Crystal Structure of a Bulged RNA Tetraplex at 1.1 Å Resolution: Implications for a Novel Binding Site in RNA Tetraplex".
4. Dermitzakis ET, Reymond A, Scamuffa N, Ucla C, Kirkness E, Rossier C, and Antonarakis SE, "Evolutionary Discrimination of Mammalian Conserved Non-Genic Sequences (CNGs)".
5. Hershkovitz E, Tannenbaum E, Howerton SB, Sheth A, Tannenbaum A, and Williams LD, "Automated identification of RNA conformational motifs: theory and application to the HM LSU 23S rRNA", Nucleic Acids Research, vol. 31, no. 21, pp. 6249-6257 (November 1, 2003).
6. Brosius J, "How Significant is 98.5% 'Junk' in Mammalian Genome?", Bioinformatics vol. 19 suppl. 2, (September, 2003), page ii35.
7. Doudna JA, "Structural Genomics of RNA", Nature Structural Biology, vol. 7, no. 11, supp, pp. 954-956 (November, 2000).
8. Hovsepian JA, and Frenster JH, "RNA-Induced Melting of DNA during Selective Gene Transcription", Molec. Biol. Cell, vol. 13, supp. p. 239a (November, 2002).
9. Frenster JH, "Ultrastructural
Probes of Active DNA Sites, and the RNA Activators of DNA".