Published in Bioinformatics vol.19,  no.17, pages 2167–2170 (November 22, 2003)
DOI:10.1093/bioinformatics/btg320
http://bioinformatics.oupjournals.org/cgi/content/abstract/19/17/2167?etoc

"Silent DNA: speaking RNA language?"

Alexander E.Vinogradov

Institute of Cytology,Russian Academy of Sciences,Tikhoretsky Ave.4, St Petersburg 194064,Russia

Contact: aevin@mail.cytspb.rssi.ru



ABSTRACT

The sequence of silent DNA in the human genome (intergenic spacers, introns and synonymous codon positions of protein-coding genes) was found here to have the higher thermostability of corresponding RNA/RNA and RNA/DNA duplexes as compared with randomized sequence. This difference
increased with elevation of GC content. The revealed effect was not due to correlation of RNA/RNA and RNA/DNA thermostabilities with thermostability of the DNA/DNA duplex, which, on the contrary, was lower than in the randomized sequence and lagged behind the elevation of GC content. The same picture was observed in the genomes of other warm-blooded vertebrates but not in the lower organisms. This finding
suggests that RNA–RNA and RNA–DNA interactions could be involved in the putative function of silent DNA.

INTRODUCTION

It is assumed that up to 99% of the human genome does not code for proteins (IHGSC, 2001; Venter et al., 2001). The function of this DNA, if any, is unknown. Most human genes are concentrated in the GC-rich genome core (Bernardi, 2000). The causes of this GC-enrichment are also unknown, but are supposed to be due to either mutation bias (Francino and Ochman, 1999; IHGSC, 2001), biased gene conversion (Galtier et al., 2001; Galtier, 2003), selection for thermostability of DNA helix (Bernardi, 2000) or physical properties
associated with active transcription (bendability and ability to undergo B–Z transition) (Vinogradov, 2001, 2003). Here the relative (to randomized sequences) thermostabilities of DNA/DNA, RNA/RNA and RNA/DNA duplexes are analyzed in different genomic regions of warm-blooded vertebrates and lower animals.

MATERIALS AND METHODS

The sequences were extracted from GenBank. Genes were checked for duplicates on the basis of coding sequence (CDS) similarity (>99%). Pseudogenes were not included in the gene-level analysis, but were taken into account for the determination of intergenic spacers. The transposable elements and the level of their divergence (percent of nucleotide substitutions) from consensus sequences were determined with the RepeatMasker program (A.F.A. Smit and P. Green,
http://ftp.genome.washington.edu/RM/RepeatMasker.html ). The thermostabilities of each sequence were determined using the dinucleotide tables for free energy of melting (deltaG) for RNA/RNA (Xia et al., 1998), RNA/DNA (Sugimoto et al., 1995) and DNA/DNA (SantaLucia, 1998) duplexes in a sliding dinucleotide frame (with 1-nt steps), and averaged for each sequence. (In the case of RNA/DNA duplex, where there is a difference between strands, the results with the GenBank strand considered as the DNAstrand are presented. The results with the opposite strand were similar.)

For each non-coding genomic sequence (introns, intergenic spacers and Alu repeats), 10 randomizations were made, and thermostabilities calculated for these randomized sequences were averaged (to approximate the mathematical expectation for randomized sequence). For each protein-coding sequence, all synonymous codon positions that can be permuted with conservation of the same GC content of each synonymous position (i.e. only G<->C or A<->T replacements were allowed to ensure the strictest conditions of permutation), were randomly permuted 10 times with preservation of the mean purine content of the total set of synonymous positions, and thermostabilities calculated for these permuted sequences were averaged. The relative value
of each thermostability was determined for each genomic sequence as the difference between the value for the genomic sequence itself and the average value for its 10 randomized or permuted sequences. (The relative thermostabilities determined as ratios between the values for genomic and randomized/permuted sequences were also analyzed and showed qualitatively the same picture.)

RESULTS AND DISCUSSION

In the introns and intergenic spacers of warm-blooded vertebrates, the relative thermostabilities of RNA/RNA and RNA/DNA duplexes were not only generally greater than zero but increased with the elevation of GC content (Fig. 1 and Table 1). The opposite picture was observed for the relative thermostabilities of DNA/DNA duplex. Genomes of the lower animals did not show such regularities (Table 1). In the CDS, the relative thermostabilities of RNA/RNA and RNA/DNA duplexes were also positive in the genomes of warm-blooded vertebrates and increased with elevation of GC content, whereas the relative thermostabilities of DNA/DNA duplex showed the opposite trend (Table 1). Among the lower organisms, the coding sequences of the pufferfish and the fruitfly showed a similar trend but increments in the relative RNA/RNA(DNA) thermostabilities were lower and there was no decrement in the relative thermostability of DNA/DNA duplex (Table 1).

Fig. 1. Regression of the relative thermostabilities of corresponding duplexes on GC content for the human introns.

Generally, there seems to be a trade-off between relative RNA/RNA(DNA) thermostabilities and relative DNA/DNA thermostability caused by intrinsic physical properties of nucleic acids, because they correlated negatively in all genomes studied (data not shown). Therefore, a great increment in the former (as in warm-blooded vertebrates) is to be associated with a decrement in the latter.

In the warm-blooded vertebrates, all the effects were more pronounced in the introns as compared with the intergenic spacers and synonymous positions of coding sequences (Tables 1 and 2). (Hence, these effects can be helpful for gene prediction.)

The Alu repeats, the most common retroposons in the human genome (above 95% of which are already dead) (Smit, 1999; Aleman et al., 2000), showed an increase in the relative RNA/RNA(DNA) thermostabilities both with elevation of GCcontent and (independently) with the divergence from consensus sequences which approximate active ancestor copies (Table 3).

The data obtained suggest that thermostability of the corresponding RNA/RNA(DNA) duplexes can be important for non-coding DNA of warm-blooded vertebrates, especially in the GC-rich genome core. Since the non-coding DNA may accumulate to a considerable degree through the activity of retroposons, it could be supposed that the observed effects originally emerged in the retroposons to facilitate their propagation. However, the thermostabilities of RNA/RNA(DNA) duplexes were lower in the active ancestor copies of Alu retroposons and increased after they were dead (Table 3). Furthermore, the regions of non-coding DNA where no transposable elements were detectable showed the same effect as the total sequences (Tables 1 and 2).

It is well known that most (>95%)RNA transcribed in the mammalian cells is short-lived and not translated into proteins (Mattick and Gagen, 2001). Many small regulatory RNAs were discovered recently and their number is believed to be still underestimated (Eddy, 2001; Pasquinelli, 2002; Wagner and Flardh, 2002). It was suggested that in the higher eukaryotes, which turned out to have a surprisingly small number of protein-coding genes but a large amount of presumably non-coding DNA, the unknown RNA–RNA and RNA–DNA interactions operate as multitasked regulatory networks (Mattick, 2001; Mattick and Gagen, 2001). In fact, a new paradigm for molecular and cellular biology was proposed. The data obtained here support this suggestion. The effects being stronger in the introns is also in accordance with this hypothesis.
The regulatory networks based on RNA–RNA(DNA) interactions involving silent DNA may have appeared earlier than in warm-blooded vertebrates, but thermostabilities of these interactions can become crucial only in these organisms. This could lead to GC-enrichment of the genome and to the shaping of its sequence for even higher RNA–RNA(DNA) thermostabilities than follow from the nucleotide content. The main increase in thermostability was due to GC-enrichment per se, but the effects revealed here indicate the thermostabil-ity
of which duplex(es) might have been a leading cause of this enrichment. The lagging thermostability of DNA/DNA duplex is understandable because the melting energy of the long DNA duplexes is much higher than the body temperature of warm-blooded vertebrates (Sugimoto et al., 1996), and because of the (above mentioned) inverse relation between the relative RNA/RNA(DNA) and DNA/DNA thermostabilities.

As for putative regulatory RNA–RNA(DNA) interactions, they probably hinge on the short (or incomplete) sequence match, therefore the thermostabilities of these interactions can become important in warm-blooded organisms. An analogy exists in prokaryotes where GC content of the total genome does not correlate with habitat temperature, whereas GC content of the ribosomal RNA does (Galtier and Lobry, 1997; Hurst and Merchant, 2001).

ACKNOWLEDGEMENTS

This work was supported by the Russian Foundation for Basic Research (RFBR).

REFERENCES

Aleman, C., Roy-Engel, A.M., Shaikh, T.H. and Deininger, P.L. (2000) Cis-acting influences on Alu RNA levels. Nucleic Acids Res., 28, 4755–4761.

Bernardi, G. (2000) The compositional evolution of vertebrate genomes. Gene, 259, 31–43.

Eddy, S.R. (2001) Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet., 2, 919–929.

Francino, M.P. and Ochman, H. (1999) Isochores result from mutation not selection. Nature, 400, 30–31.

Galtier, N. (2003) Gene conversion drives GC content evolution in mammalian histones. Trends Genet., 19, 65–68.

Galtier, N. and Lobry, J.R. (1997) Relationships between genomic G +C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J. Mol. Evol., 44, 632–636.

Galtier, N., Piganeau, G., Mouchiroud, D. and Duret, L. (2001) GC content evolution in mammalian genomes, the biased gene conversion hypothesis. Genetics, 159, 907–911.

Hurst, L.D. and Merchant, A.R. (2001) High guanine–cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes. Proc. R. Soc. Lond. B, 268, 493–497.

IHGSC (International Human Genome Sequencing Consortium) (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921.

Mattick, J.S. (2001) Non-coding RNAs: the architects of eukaryotic complexity. EMBO Rep., 2, 986–991.

Mattick, J.S. and Gagen, M.J. (2001) The evolution of controlled multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms. Mol. Biol. Evol., 18, 1611–1630.

Pasquinelli, A.E. (2002) MicroRNAs: deviants no longer. Trends Genet., 18, 171–173.

SantaLucia, J.,Jr. (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl Acad. Sci. USA, 95, 1460–1465.

Smit, A.F. (1999) Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr. Opin. Genet. Dev., 9, 657–663.

Sugimoto, N., Nakano, S., Katoh, M., Matsumura, A., Nakamuta, H., Ohmichi, T., Yoneyama, M. and Sasaki, M. (1995) Thermodynamic parameters to predict stability of RNA/DNA hybrid duplexes. Biochemistry, 34, 11211–11216.

Sugimoto, N., Nakano, S., Yoneyama, M. and Honda, K. (1996) Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Res., 24, 4501–4505.

Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A. et al. (2001) The sequence of the human genome. Science, 291, 1304–1351.

Vinogradov, A.E. (2001) Bendable genes of warm-blooded vertebrates. Mol. Biol. Evol., 18, 2195–2200.

Vinogradov, A.E. (2003) DNA helix: the importance of being GC-rich. Nucleic Acids Res., 31, 1838–1844.

Wagner, E.G. and Flardh, K. (2002) Antisense RNAs everywhere? Trends Genet., 18, 223–226.

Xia, T., SantaLucia, J.,Jr., Burkard, M.E., Kierzek, R., Schroeder, S.J., Jiao, X., Cox, C. and Turner, D.H. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson–Crick base pairs. Biochemistry, 37, 14719–14735.



Additional References:

1. Iwakiri D, Eizuru Y, Tokunaga M, and Takada K, "Autocrine Growth of Epstein-Barr Virus-Positive Gastric Carcinoma Cells Mediated by an Epstein-Barr Virus-Encoded Small RNA".

2. Santulli-Marotto S, Nair SK, Rusconi C, Sullenger B,  and Gilboa E, "Multivalent RNA Aptamers That Inhibit CTLA-4 and Enhance Tumor Immunity".

3. Pan B, Xiong Y, Shi K, and Sundaralingam M, "Crystal Structure of a Bulged RNA Tetraplex at 1.1 Å Resolution: Implications for a Novel Binding Site in RNA Tetraplex".

4. Dermitzakis ET, Reymond A, Scamuffa N, Ucla C, Kirkness E, Rossier C, and Antonarakis SE, "Evolutionary Discrimination of Mammalian Conserved Non-Genic Sequences (CNGs)".

5. Hershkovitz E, Tannenbaum E, Howerton SB, Sheth A, Tannenbaum A,  and Williams LD, "Automated identification of RNA conformational motifs: theory and application to the HM LSU 23S rRNA", Nucleic Acids Research, vol. 31, no. 21, pp. 6249-6257 (November 1, 2003).

6. Brosius J, "How Significant is 98.5% 'Junk' in Mammalian Genome?", Bioinformatics vol. 19 suppl. 2, (September, 2003),  page ii35.

7. Doudna JA, "Structural Genomics of RNA", Nature Structural Biology, vol. 7, no. 11, supp,  pp. 954-956 (November, 2000).

8. Hovsepian JA, and Frenster JH, "RNA-Induced Melting of DNA during Selective Gene Transcription", Molec. Biol. Cell, vol. 13, supp. p. 239a (November, 2002).

9. Frenster JH, "Ultrastructural Probes of Active DNA Sites, and the RNA Activators of DNA".
 



Top of Page - Euchromatin Network - Current Research - Forums - Other Sites - Future Events -
For Further Information and Feedback:

E-mail: frenster@euchromatin.net
Phone: +1 650 367 6483

euchromatin: "the most active portion of the genome within the cell nucleus".