Long non-coding RNA

Source: Wikipedia, the free encyclopedia.
(Redirected from
Long noncoding RNA
)
Different types of long non-coding RNAs.[1]

Long non-coding RNAs (long ncRNAs, lncRNA) are a type of

small nucleolar RNAs (snoRNAs), and other short RNAs.[3] Given that some lncRNAs have been reported to have the potential to encode small proteins or micro-peptides, the latest definition of lncRNA is a class of RNA molecules of over 200 nucleotides that have no or limited coding capacity.[4] Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of lncRNA which do not overlap protein-coding genes.[5]

Long non-coding RNAs include intergenic lincRNAs, intronic ncRNAs, and sense and antisense lncRNAs, each type showing different genomic positions in relation to genes and exons.[1][3]

Abundance

Long non-coding transcripts are found in many species. Large-scale complementary DNA (cDNA) sequencing projects such as FANTOM reveal the complexity of these transcripts in humans.[6] The FANTOM3 project identified ~35,000 non-coding transcripts that bear many signatures of messenger RNAs, including 5' capping, splicing, and poly-adenylation, but have little or no open reading frame (ORF).[6] This number represents a conservative lower estimate, since it omitted many singleton transcripts and non-polyadenylated transcripts (tiling array data shows more than 40% of transcripts are non-polyadenylated).[7] Identifying ncRNAs within these cDNA libraries is challenging since it can be difficult to distinguish protein-coding transcripts from non-coding transcripts. It has been suggested through multiple studies that testis,[8] and neural tissues express the greatest amount of long non-coding RNAs of any tissue type.[9] Using FANTOM5, 27,919 long ncRNAs have been identified in various human sources.[10]

Quantitatively, lncRNAs demonstrate ~10-fold lower abundance than mRNAs,[11][12] which is explained by higher cell-to-cell variation of expression levels of lncRNA genes in the individual cells, when compared to protein-coding genes.[13] In general, the majority (~78%) of lncRNAs are characterized as tissue-specific, as opposed to only ~19% of mRNAs.[11] Only 3.6% of human lncRNA genes are expressed in various biological contexts and 34% of lncRNA genes are expressed at high level (top 25% of both lncRNAs and mRNAs) in at least one biological context.[14] In addition to higher tissue specificity, lncRNAs are characterized by higher developmental stage specificity,[15] and cell subtype specificity in tissues such as human neocortex[16] and other parts of the brain, regulating correct brain development and function.[17] In 2022, a comprehensive integration of lncRNAs from existing databases, revealed that there are 95,243 lncRNA genes and 323,950 transcripts in humans.[18]

In comparison to mammals relatively few studies have focused on the prevalence of lncRNAs in plants. However an extensive study considering 37 higher plant species and six algae identified ~200,000 non-coding transcripts using an in-silico approach,[19] which also established the associated Green Non-Coding Database (GreeNC), a repository of plant lncRNAs.

Genomic organization

In 2005 the landscape of the mammalian genome was described as numerous 'foci' of transcription that are separated by long stretches of

intergenic space.[6] While some long ncRNAs are located within the intergenic stretches, the majority are overlapping sense and antisense transcripts that often include protein-coding genes,[20] giving rise to a complex hierarchy of overlapping isoforms.[21] Genomic sequences within these transcriptional foci are often shared within a number of coding and non-coding transcripts in the sense and antisense directions[22] For example, 3012 out of 8961 cDNAs previously annotated as truncated coding sequences within FANTOM2 were later designated as genuine ncRNA variants of protein-coding cDNAs.[6]
While the abundance and conservation of these arrangements suggest they have biological relevance, the complexity of these foci frustrates easy evaluation.

The GENCODE consortium has collated and analysed a comprehensive set of human lncRNA annotations and their genomic organisation, modifications, cellular locations and tissue expression profiles.[9] Their analysis indicates human lncRNAs show a bias toward two-exon transcripts.[9]

Identification software

Name Taxonomic group Web server Repository Input file Main model / algorithm Training set Year published Reference
DeepPlnc Plant DeepPlnc Server DeepPlnc FASTA Neural network Yes 2022 [23]
RNAsamba All RNAsamba RNAsamba FASTA Neural network Yes 2020 [24]
LGC Plant, animal LGC FASTA, BED, GTF Relationship between ORF length and GC content No 2019 [25]
CPAT Human, fly, mouse, zebrafish CPAT CPAT FASTA/BED Logistic regression Yes 2013 [26]
COME Plant, human, mouse, fly, worm COME COME GTF Random forest Yes 2017 [27]
CNCI Plant, animal NA FASTA, GTF Support vector machine No 2013 [28]
PLEK Vertebrate NA PLEK FASTA Support vector machine No 2014 [28]
FEELnc All NA FEELnc FASTA, GTF Random forest Yes 2017 [29]
PhyloCSF Vertebrate, fly, mosquito, yeast, worm NA FASTA Phylogenetic codon model Yes 2011 [30]
slncky All NA slncky FASTA, BED
Evolutionary conservation
Yes 2016 [31]

Translation

There has been considerable debate about whether lncRNAs have been misannotated and do in fact encode proteins. Several lncRNAs have been found to in fact encode for peptides with biologically significant function.[32][33][34] Ribosome profiling studies have suggested that anywhere from 40% to 90% of annotated lncRNAs are in fact translated,[35][36] although there is disagreement about the correct method for analyzing ribosome profiling data.[37] Additionally, it is thought that many of the peptides produced by lncRNAs may be highly unstable and without biological function.[36]

Conservation

Initial studies into lncRNA conservation noted that as a class, they were enriched for

orthologous genomic region. Some argue that these observations suggest non-functionality of the majority of lncRNAs,[43][44][45] while others argue that they may be indicative of rapid species-specific adaptive selection.[46]

While the turnover of lncRNA transcription is much higher than initially expected, it is important to note that still, hundreds of lncRNAs are conserved at the sequence level. There have been several attempts to delineate the different categories of selection signatures seen amongst lncRNAs including: lncRNAs with strong sequence conservation across the entire length of the gene, lncRNAs in which only a portion of the transcript (e.g. 5′ end, splice sites) is conserved, and lncRNAs that are transcribed from syntenic regions of the genome but have no recognizable sequence similarity.[47][48][49] Additionally, there have been attempts to identify conserved secondary structures in lncRNAs, though these studies have currently given way to conflicting results.[50][51]

Functions

Despite claims that the majority of long noncoding RNAs in mammals are likely to be functional,[52][53] it seems likely that most of them are transcriptional noise and only a relatively small proportion has been demonstrated to be biologically relevant.[45][54]

Some lncRNAs have been functionally annotated in

community curation of human lncRNAs).[57] According to the curation of functional mechanisms of lncRNAs based on the literatures, lncRNAs are extensively reported to be involved in ceRNA regulation, transcriptional regulation, and epigenetic regulation.[57] A further large-scale sequencing study provides evidence that many transcripts thought to be lncRNAs may, in fact, be translated into proteins.[58]

In the regulation of gene transcription

In gene-specific transcription

In

RNA transcription is a tightly regulated process. Noncoding RNAs act upon different aspects of this process, targeting transcriptional modulators, RNA polymerase (RNAP) II and even the DNA duplex to regulate gene expression.[59]

NcRNAs modulate transcription by several mechanisms, including functioning themselves as co-regulators, modifying

tumorigenesis in like fashion to protein-coding RNA.[64][65][66]

Local ncRNAs can also recruit transcriptional programmes to regulate adjacent protein-coding gene expression. For example, divergent lncRNAs that are transcribed in the opposite direction to nearby protein-coding genes (~20% of total lncRNAs in mammalian genomes) possibly regulate the transcription of nearby adjacent essential developmental regulatory genes in pluripotent cells.[67][68]

The

Apolipoprotein A1 (APOA1) regulates the transcription of APOA1 through epigenetic modifications.[71]

Recent evidence has raised the possibility that transcription of genes that escape from X-inactivation might be mediated by expression of long non-coding RNA within the escaping chromosomal domains.[72]

Regulating basal transcription machinery

NcRNAs also target general

TFIIH to phosphorylate the C-terminal domain of RNAP II.[75] In contrast the ncRNA 7SK is able to repress transcription elongation by, in combination with HEXIM1/2, forming an inactive complex that prevents PTEFb from phosphorylating the C-terminal domain of RNAP II,[75][76][77] repressing global elongation under stressful conditions. These examples, which bypass specific modes of regulation at individual promoters provide a means of quickly affecting global changes in gene expression
.

The ability to quickly mediate global changes is also apparent in the rapid expression of non-coding

Alu elements in humans and analogous B1 and B2 elements in mice have succeeded in becoming the most abundant mobile elements within the genomes, comprising ~10% of the human and ~6% of the mouse genome, respectively.[78][79] These elements are transcribed as ncRNAs by RNAP III in response to environmental stresses such as heat shock,[80] where they then bind to RNAP II with high affinity and prevent the formation of active pre-initiation complexes.[81][82][83][84] This allows for the broad and rapid repression of gene expression in response to stress.[81][84]

A dissection of the functional sequences within Alu RNA transcripts has drafted a

repetitive elements throughout the mammalian genome may be partly due to these functional domains being co-opted into other long ncRNAs during evolution, with the presence of functional repeat sequence domains being a common characteristic of several known long ncRNAs including Kcnq1ot1, Xlsirt and Xist.[86][87][88][89]

In addition to

regulatory circuit nested within ncRNAs whereby Alu or B2 RNAs repress general gene expression, while other ncRNAs activate the expression of specific genes
.

Transcribed by RNA polymerase III

Many of the ncRNAs that interact with general transcription factors or RNAP II itself (including 7SK, Alu and B1 and B2 RNAs) are transcribed by RNAP III,[93] uncoupling their expression from RNAP II, which they regulate. RNAP III also transcribes other ncRNAs, such as BC2, BC200 and some microRNAs and snoRNAs, in addition to housekeeping ncRNA genes such as tRNAs, 5S rRNAs and snRNAs.[93] The existence of an RNAP III-dependent ncRNA transcriptome that regulates its RNAP II-dependent counterpart is supported by the finding of a set of ncRNAs transcribed by RNAP III with sequence homology to protein-coding genes. This prompted the authors to posit a 'cogene/gene' functional regulatory network,[94] showing that one of these ncRNAs, 21A, regulates the expression of its antisense partner gene, CENP-F in trans.

In post-transcriptional regulation

In addition to regulating transcription, ncRNAs also control various aspects of post-transcriptional

complementary base pairing with the target mRNA. The formation of RNA duplexes between complementary ncRNA and mRNA may mask key elements within the mRNA required to bind trans-acting factors, potentially affecting any step in post-transcriptional gene expression including pre-mRNA processing and splicing, transport, translation, and degradation.[95]

In splicing

The

splice site.[96] Therefore, the ectopic expression of the antisense transcript represses splicing and induces translation of the Zeb2 mRNA during mesenchymal development. Likewise, the expression of an overlapping antisense Rev-ErbAa2 transcript controls the alternative splicing of the thyroid hormone receptor ErbAa2 mRNA to form two antagonistic isoforms.[97]

In translation

NcRNA may also apply additional regulatory pressures during translation, a property particularly exploited in neurons where the dendritic or axonal translation of mRNA in response to synaptic activity contributes to changes in synaptic plasticity and the remodelling of neuronal networks. The RNAP III transcribed BC1 and BC200 ncRNAs, that previously derived from tRNAs, are expressed in the mouse and human central nervous system, respectively.[98][99] BC1 expression is induced in response to synaptic activity and synaptogenesis and is specifically targeted to dendrites in neurons.[100] Sequence complementarity between BC1 and regions of various neuron-specific mRNAs also suggest a role for BC1 in targeted translational repression.[101] Indeed, it was recently shown that BC1 is associated with translational repression in dendrites to control the efficiency of dopamine D2 receptor-mediated transmission in the striatum[102] and BC1 RNA-deleted mice exhibit behavioural changes with reduced exploration and increased anxiety.[103]

In siRNA-directed gene regulation

In addition to masking key elements within single-stranded

Tsix (see above).[106]

In epigenetic regulation

Epigenetic modifications, including

sumoylation, affect many aspects of chromosomal biology, primarily including regulation of large numbers of genes by remodeling broad chromatin domains.[107][108] While it has been known for some time that RNA is an integral component of chromatin,[109][110] it is only recently that we are beginning to appreciate the means by which RNA is involved in pathways of chromatin modification.[111][112][113] For example, Oplr16 epigenetically induces the activation of stem cell core factors by coordinating intrachromosomal looping and recruitment of DNA demethylase TET2.[114]

In

protein-coding genes have antisense partners, including many tumour suppressor genes that are frequently silenced by epigenetic mechanisms in cancer.[119] A recent study observed an inverse expression profile of the p15 gene and an antisense ncRNA in leukaemia.[119] A detailed analysis showed the p15 antisense ncRNA (CDKN2BAS) was able to induce changes to heterochromatin and DNA methylation status of p15 by an unknown mechanism, thereby regulating p15 expression.[119] Therefore, misexpression of the associated antisense ncRNAs may subsequently silence the tumour suppressor gene contributing towards cancer
.

Imprinting

Many emergent themes of ncRNA-directed

Igf2r/Air in directing imprinting.[121]

Almost all the genes at the

Kcnq1 loci are maternally inherited, except the paternally expressed antisense ncRNA Kcnqot1.[122] Transgenic mice with truncated Kcnq1ot fail to silence the adjacent genes, suggesting that Kcnqot1 is crucial to the imprinting of genes on the paternal chromosome.[123] It appears that Kcnqot1 is able to direct the trimethylation of lysine 9 (H3K9me3) and 27 of histone 3 (H3K27me3) to an imprinting centre that overlaps the Kcnqot1 promoter and actually resides within a Kcnq1 sense exon.[124] Similar to HOTAIR (see above), Eed-Ezh2 Polycomb complexes are recruited to the Kcnq1 loci paternal chromosome, possibly by Kcnqot1, where they may mediate gene silencing through repressive histone methylation.[124] A differentially methylated imprinting centre also overlaps the promoter of a long antisense ncRNA Air that is responsible for the silencing of neighbouring genes at the Igf2r locus on the paternal chromosome.[125][126] The presence of allele-specific histone methylation at the Igf2r locus suggests Air also mediates silencing via chromatin modification.[127]

Xist and X-chromosome inactivation

The

H3K9 hypermethylation and H4K20 monomethylation as well as H2AK119 monoubiquitylation. These modifications coincide with the transcriptional silencing of the X-linked genes.[129] Xist RNA also localises the histone variant macroH2A to the inactive X–chromosome.[130] There are additional ncRNAs that are also present at the Xist loci, including an antisense transcript Tsix, which is expressed from the future active chromosome and able to repress Xist expression by the generation of endogenous siRNA.[106] Together these ncRNAs ensure that only one X-chromosome is active in female
mammals.

Telomeric non-coding RNAs

telomeric repeat-containing RNAs.[133] These ncRNAs are heterogeneous in length, transcribed from several sub-telomeric loci and physically localise to telomeres. Their association with chromatin, which suggests an involvement in regulating telomere specific heterochromatin modifications, is repressed by SMG proteins that protect chromosome ends from telomere loss.[133] In addition, TelRNAs block telomerase activity in vitro and may therefore regulate telomerase activity.[132]
Although early, these studies suggest an involvement for telomeric ncRNAs in various aspects of telomere biology.

In regulation of DNA replication timing and chromosome stability

Asynchronously replicating autosomal RNAs (ASARs) are very long (~200kb) non-coding RNAs that are non-spliced, non-polyadenylated, and are required for normal DNA replication timing and chromosome stability.[134][135][136] Deletion of any one of the genetic loci containing ASAR6, ASAR15, or ASAR6-141 results in the same phenotype of delayed replication timing and delayed mitotic condensation (DRT/DMC) of the entire chromosome. DRT/DMC results in chromosomal segregation errors that lead to increased frequency of secondary rearrangements and an unstable chromosome. Similar to Xist, ASARs show random monoallelic expression and exist in asynchronous DNA replication domains. Although the mechanism of ASAR function is still under investigation, it is hypothesized that they work via similar mechanisms as the Xist lncRNA, but on smaller autosomal domains resulting in allele specific changes in gene expression.

Incorrect reparation of

NHEJ) and homology-directed repair (HDR). Gene mutations or variation in expression levels of such RNAs can lead to local DNA repair defects, increasing the chromosome aberration frequency. Moreover, it was demonstrated that some RNAs could stimulate long-range chromosomal rearrangements.[137]

In aging and disease

The discovery that long ncRNAs function in various aspects of cell biology has led to research on their role in

.

The first published report of an alteration in lncRNA abundance in aging and human

Alu repeat family by Watson and Sutcliffe in 1987 known as BC200 (brain, cytoplasmic, 200 nucleotide).[140]

While many association studies have identified unusual expression of long ncRNAs in disease states, there is little understanding of their role in causing disease. Expression analyses that compare

tumourigenesis is relatively unknown. For example, the ncRNAs HIS-1 and BIC have been implicated in cancer development and growth control, but their function in normal cells is unknown.[147][148] In addition to cancer, ncRNAs also exhibit aberrant expression in other disease states. Overexpression of PRINS is associated with psoriasis susceptibility, with PRINS expression being elevated in the uninvolved epidermis of psoriatic patients compared with both psoriatic lesions and healthy epidermis.[149]

Genome-wide profiling revealed that many transcribed non-coding

development
.

Recently, a number of association studies examining single nucleotide polymorphisms (SNPs) associated with disease states have been mapped to long ncRNAs. For example, SNPs that identified a susceptibility locus for myocardial infarction mapped to a long ncRNA, MIAT (myocardial infarction associated transcript).[150] Likewise, genome-wide association studies identified a region associated with coronary artery disease[151] that encompassed a long ncRNA, ANRIL.[152] ANRIL is expressed in tissues and cell types affected by atherosclerosis[153][154] and its altered expression is associated with a high-risk haplotype for coronary artery disease.[154][155]

The complexity of the transcriptome, and our evolving understanding of its structure may inform a reinterpretation of the functional basis for many natural polymorphisms associated with disease states. Many SNPs associated with certain disease conditions are found within non-coding regions and the complex networks of non-coding transcription within these regions make it particularly difficult to elucidate the functional effects of polymorphisms. For example, a SNP both within the truncated form of ZFAT and the promoter of an antisense transcript increases the expression of ZFAT not through increasing the mRNA stability, but rather by repressing the expression of the antisense transcript.[156]

The ability of long ncRNAs to regulate associated protein-coding genes may contribute to disease if misexpression of a long ncRNA deregulates a protein coding gene with clinical significance. In similar manner, an antisense long ncRNA that regulates the expression of the sense BACE1 gene, a crucial enzyme in Alzheimer's disease etiology, exhibits elevated expression in several regions of the brain in individuals with Alzheimer's disease[157] Alteration of the expression of ncRNAs may also mediate changes at an epigenetic level to affect gene expression and contribute to disease aetiology. For example, the induction of an antisense transcript by a genetic mutation led to DNA methylation and silencing of sense genes, causing β-thalassemia in a patient.[158]

Alongside their role in mediating pathological processes, long noncoding RNAs play a role in the immune response to vaccination, as identified for both the influenza vaccine and the yellow fever vaccine.[159]

See also

References

  1. ^
    PMID 30781588
    .
  2. . "We're calling long noncoding RNAs a class, when actually the only definition is that they are longer than 200 bp," says Ana Marques, a Research Fellow at the University of Oxford who uses evolutionary approaches to understand lncRNA function.
  3. ^ .
  4. .
  5. .
  6. ^ .
  7. .
  8. ^ .
  9. ^ .
  10. .
  11. ^ .
  12. .
  13. .
  14. .
  15. .
  16. .
  17. .
  18. .
  19. .
  20. .
  21. .
  22. .
  23. .
  24. .
  25. .
  26. .
  27. .
  28. ^ .
  29. .
  30. .
  31. .
  32. .
  33. .
  34. .
  35. .
  36. ^ .
  37. .
  38. .
  39. .
  40. .
  41. .
  42. .
  43. .
  44. .
  45. ^ .
  46. .
  47. .
  48. .
  49. .
  50. .
  51. .
  52. .
  53. .
  54. .
  55. .
  56. .
  57. ^ .
  58. .
  59. ^ .
  60. ^ .
  61. .
  62. .
  63. .
  64. .
  65. ^ .
  66. .
  67. .
  68. ^ Laure D Bernard, Agnès Dubois, Victor Heurtier, Véronique Fischer, Inma Gonzalez, Almira Chervova, Alexandra Tachtsidi, Noa Gil, Nick Owens, Lawrence E Bates, Sandrine Vandormael-Pournin, José C R Silva, Igor Ulitsky, Michel Cohen-Tannoudji, Pablo Navarro, OCT4 activates a Suv39h1-repressive antisense lncRNA to couple histone H3 Lysine 9 methylation to pluripotency, Nucleic Acids Research, Volume 50, Issue 13, 22 July 2022, Pages 7367–7379, https://doi.org/10.1093/nar/gkac550
  69. PMID 18509338
    .
  70. .
  71. .
  72. .
  73. .
  74. .
  75. ^ .
  76. .
  77. .
  78. .
  79. .
  80. .
  81. ^ .
  82. ^ .
  83. .
  84. ^ .
  85. .
  86. .
  87. .
  88. .
  89. .
  90. .
  91. .
  92. ^ .
  93. ^ .
  94. .
  95. .
  96. ^ .
  97. .
  98. .
  99. .
  100. .
  101. .
  102. .
  103. .
  104. .
  105. .
  106. ^ .
  107. .
  108. ^ .
  109. .
  110. .
  111. .
  112. ^ .
  113. ^ .
  114. .
  115. .
  116. .
  117. .
  118. .
  119. ^ .
  120. .
  121. .
  122. .
  123. .
  124. ^ .
  125. .
  126. .
  127. .
  128. ^ .
  129. .
  130. .
  131. .
  132. ^ .
  133. ^ .
  134. .
  135. .
  136. .
  137. .
  138. .
  139. .
  140. .
  141. ^ .
  142. .
  143. .
  144. .
  145. .
  146. .
  147. .
  148. .
  149. .
  150. .
  151. .
  152. .
  153. .
  154. ^ .
  155. .
  156. .
  157. .
  158. .
  159. .