Long non-coding RNA
Long non-coding RNAs (long ncRNAs, lncRNA) are a type of
Long non-coding RNAs include intergenic lincRNAs, intronic ncRNAs, and sense and antisense lncRNAs, each type showing different genomic positions in relation to genes and exons.[1][3]
Abundance
Long non-coding transcripts are found in many species. Large-scale complementary DNA (cDNA) sequencing projects such as FANTOM reveal the complexity of these transcripts in humans.[6] The FANTOM3 project identified ~35,000 non-coding transcripts that bear many signatures of messenger RNAs, including 5' capping, splicing, and poly-adenylation, but have little or no open reading frame (ORF).[6] This number represents a conservative lower estimate, since it omitted many singleton transcripts and non-polyadenylated transcripts (tiling array data shows more than 40% of transcripts are non-polyadenylated).[7] Identifying ncRNAs within these cDNA libraries is challenging since it can be difficult to distinguish protein-coding transcripts from non-coding transcripts. It has been suggested through multiple studies that testis,[8] and neural tissues express the greatest amount of long non-coding RNAs of any tissue type.[9] Using FANTOM5, 27,919 long ncRNAs have been identified in various human sources.[10]
Quantitatively, lncRNAs demonstrate ~10-fold lower abundance than mRNAs,[11][12] which is explained by higher cell-to-cell variation of expression levels of lncRNA genes in the individual cells, when compared to protein-coding genes.[13] In general, the majority (~78%) of lncRNAs are characterized as tissue-specific, as opposed to only ~19% of mRNAs.[11] Only 3.6% of human lncRNA genes are expressed in various biological contexts and 34% of lncRNA genes are expressed at high level (top 25% of both lncRNAs and mRNAs) in at least one biological context.[14] In addition to higher tissue specificity, lncRNAs are characterized by higher developmental stage specificity,[15] and cell subtype specificity in tissues such as human neocortex[16] and other parts of the brain, regulating correct brain development and function.[17] In 2022, a comprehensive integration of lncRNAs from existing databases, revealed that there are 95,243 lncRNA genes and 323,950 transcripts in humans.[18]
In comparison to mammals relatively few studies have focused on the prevalence of lncRNAs in plants. However an extensive study considering 37 higher plant species and six algae identified ~200,000 non-coding transcripts using an in-silico approach,[19] which also established the associated Green Non-Coding Database (GreeNC), a repository of plant lncRNAs.
Genomic organization
In 2005 the landscape of the mammalian genome was described as numerous 'foci' of transcription that are separated by long stretches of
The GENCODE consortium has collated and analysed a comprehensive set of human lncRNA annotations and their genomic organisation, modifications, cellular locations and tissue expression profiles.[9] Their analysis indicates human lncRNAs show a bias toward two-exon transcripts.[9]
Identification software
Name | Taxonomic group | Web server | Repository | Input file | Main model / algorithm | Training set | Year published | Reference |
---|---|---|---|---|---|---|---|---|
DeepPlnc | Plant | DeepPlnc Server | DeepPlnc | FASTA | Neural network | Yes | 2022 | [23] |
RNAsamba | All | RNAsamba | RNAsamba | FASTA | Neural network | Yes | 2020 | [24] |
LGC | Plant, animal | LGC | FASTA, BED, GTF | Relationship between ORF length and GC content | No | 2019 | [25] | |
CPAT | Human, fly, mouse, zebrafish | CPAT | CPAT | FASTA/BED | Logistic regression | Yes | 2013 | [26] |
COME | Plant, human, mouse, fly, worm | COME | COME | GTF | Random forest | Yes | 2017 | [27] |
CNCI | Plant, animal | NA | FASTA, GTF | Support vector machine | No | 2013 | [28] | |
PLEK | Vertebrate | NA | PLEK | FASTA | Support vector machine | No | 2014 | [28] |
FEELnc | All | NA | FEELnc | FASTA, GTF | Random forest | Yes | 2017 | [29] |
PhyloCSF | Vertebrate, fly, mosquito, yeast, worm | NA | FASTA | Phylogenetic codon model | Yes | 2011 | [30] | |
slncky | All | NA | slncky | FASTA, BED | Evolutionary conservation
|
Yes | 2016 | [31] |
Translation
There has been considerable debate about whether lncRNAs have been misannotated and do in fact encode proteins. Several lncRNAs have been found to in fact encode for peptides with biologically significant function.[32][33][34] Ribosome profiling studies have suggested that anywhere from 40% to 90% of annotated lncRNAs are in fact translated,[35][36] although there is disagreement about the correct method for analyzing ribosome profiling data.[37] Additionally, it is thought that many of the peptides produced by lncRNAs may be highly unstable and without biological function.[36]
Conservation
Initial studies into lncRNA conservation noted that as a class, they were enriched for
While the turnover of lncRNA transcription is much higher than initially expected, it is important to note that still, hundreds of lncRNAs are conserved at the sequence level. There have been several attempts to delineate the different categories of selection signatures seen amongst lncRNAs including: lncRNAs with strong sequence conservation across the entire length of the gene, lncRNAs in which only a portion of the transcript (e.g. 5′ end, splice sites) is conserved, and lncRNAs that are transcribed from syntenic regions of the genome but have no recognizable sequence similarity.[47][48][49] Additionally, there have been attempts to identify conserved secondary structures in lncRNAs, though these studies have currently given way to conflicting results.[50][51]
Functions
Despite claims that the majority of long noncoding RNAs in mammals are likely to be functional,[52][53] it seems likely that most of them are transcriptional noise and only a relatively small proportion has been demonstrated to be biologically relevant.[45][54]
Some lncRNAs have been functionally annotated in
In the regulation of gene transcription
In gene-specific transcription
In
NcRNAs modulate transcription by several mechanisms, including functioning themselves as co-regulators, modifying
Local ncRNAs can also recruit transcriptional programmes to regulate adjacent protein-coding gene expression. For example, divergent lncRNAs that are transcribed in the opposite direction to nearby protein-coding genes (~20% of total lncRNAs in mammalian genomes) possibly regulate the transcription of nearby adjacent essential developmental regulatory genes in pluripotent cells.[67][68]
The
Recent evidence has raised the possibility that transcription of genes that escape from X-inactivation might be mediated by expression of long non-coding RNA within the escaping chromosomal domains.[72]
Regulating basal transcription machinery
NcRNAs also target general
The ability to quickly mediate global changes is also apparent in the rapid expression of non-coding
A dissection of the functional sequences within Alu RNA transcripts has drafted a
In addition to
Transcribed by RNA polymerase III
Many of the ncRNAs that interact with general transcription factors or RNAP II itself (including 7SK, Alu and B1 and B2 RNAs) are transcribed by RNAP III,[93] uncoupling their expression from RNAP II, which they regulate. RNAP III also transcribes other ncRNAs, such as BC2, BC200 and some microRNAs and snoRNAs, in addition to housekeeping ncRNA genes such as tRNAs, 5S rRNAs and snRNAs.[93] The existence of an RNAP III-dependent ncRNA transcriptome that regulates its RNAP II-dependent counterpart is supported by the finding of a set of ncRNAs transcribed by RNAP III with sequence homology to protein-coding genes. This prompted the authors to posit a 'cogene/gene' functional regulatory network,[94] showing that one of these ncRNAs, 21A, regulates the expression of its antisense partner gene, CENP-F in trans.
In post-transcriptional regulation
In addition to regulating transcription, ncRNAs also control various aspects of post-transcriptional
In splicing
The
In translation
NcRNA may also apply additional regulatory pressures during translation, a property particularly exploited in neurons where the dendritic or axonal translation of mRNA in response to synaptic activity contributes to changes in synaptic plasticity and the remodelling of neuronal networks. The RNAP III transcribed BC1 and BC200 ncRNAs, that previously derived from tRNAs, are expressed in the mouse and human central nervous system, respectively.[98][99] BC1 expression is induced in response to synaptic activity and synaptogenesis and is specifically targeted to dendrites in neurons.[100] Sequence complementarity between BC1 and regions of various neuron-specific mRNAs also suggest a role for BC1 in targeted translational repression.[101] Indeed, it was recently shown that BC1 is associated with translational repression in dendrites to control the efficiency of dopamine D2 receptor-mediated transmission in the striatum[102] and BC1 RNA-deleted mice exhibit behavioural changes with reduced exploration and increased anxiety.[103]
In siRNA-directed gene regulation
In addition to masking key elements within single-stranded
In epigenetic regulation
Epigenetic modifications, including
In
Imprinting
Many emergent themes of ncRNA-directed
Almost all the genes at the
Xist and X-chromosome inactivation
The
Telomeric non-coding RNAs
In regulation of DNA replication timing and chromosome stability
Asynchronously replicating autosomal RNAs (ASARs) are very long (~200kb) non-coding RNAs that are non-spliced, non-polyadenylated, and are required for normal DNA replication timing and chromosome stability.[134][135][136] Deletion of any one of the genetic loci containing ASAR6, ASAR15, or ASAR6-141 results in the same phenotype of delayed replication timing and delayed mitotic condensation (DRT/DMC) of the entire chromosome. DRT/DMC results in chromosomal segregation errors that lead to increased frequency of secondary rearrangements and an unstable chromosome. Similar to Xist, ASARs show random monoallelic expression and exist in asynchronous DNA replication domains. Although the mechanism of ASAR function is still under investigation, it is hypothesized that they work via similar mechanisms as the Xist lncRNA, but on smaller autosomal domains resulting in allele specific changes in gene expression.
Incorrect reparation of
In aging and disease
The discovery that long ncRNAs function in various aspects of cell biology has led to research on their role in
The first published report of an alteration in lncRNA abundance in aging and human
While many association studies have identified unusual expression of long ncRNAs in disease states, there is little understanding of their role in causing disease. Expression analyses that compare
Genome-wide profiling revealed that many transcribed non-coding
Recently, a number of association studies examining single nucleotide polymorphisms (SNPs) associated with disease states have been mapped to long ncRNAs. For example, SNPs that identified a susceptibility locus for myocardial infarction mapped to a long ncRNA, MIAT (myocardial infarction associated transcript).[150] Likewise, genome-wide association studies identified a region associated with coronary artery disease[151] that encompassed a long ncRNA, ANRIL.[152] ANRIL is expressed in tissues and cell types affected by atherosclerosis[153][154] and its altered expression is associated with a high-risk haplotype for coronary artery disease.[154][155]
The complexity of the transcriptome, and our evolving understanding of its structure may inform a reinterpretation of the functional basis for many natural polymorphisms associated with disease states. Many SNPs associated with certain disease conditions are found within non-coding regions and the complex networks of non-coding transcription within these regions make it particularly difficult to elucidate the functional effects of polymorphisms. For example, a SNP both within the truncated form of ZFAT and the promoter of an antisense transcript increases the expression of ZFAT not through increasing the mRNA stability, but rather by repressing the expression of the antisense transcript.[156]
The ability of long ncRNAs to regulate associated protein-coding genes may contribute to disease if misexpression of a long ncRNA deregulates a protein coding gene with clinical significance. In similar manner, an antisense long ncRNA that regulates the expression of the sense BACE1 gene, a crucial enzyme in Alzheimer's disease etiology, exhibits elevated expression in several regions of the brain in individuals with Alzheimer's disease[157] Alteration of the expression of ncRNAs may also mediate changes at an epigenetic level to affect gene expression and contribute to disease aetiology. For example, the induction of an antisense transcript by a genetic mutation led to DNA methylation and silencing of sense genes, causing β-thalassemia in a patient.[158]
Alongside their role in mediating pathological processes, long noncoding RNAs play a role in the immune response to vaccination, as identified for both the influenza vaccine and the yellow fever vaccine.[159]
See also
References
- ^ PMID 30781588.
- PMID 23750541.
"We're calling long noncoding RNAs a class, when actually the only definition is that they are longer than 200 bp," says Ana Marques, a Research Fellow at the University of Oxford who uses evolutionary approaches to understand lncRNA function.
- ^ PMID 23696037.
- S2CID 258528357.
- PMID 29138516.
- ^ S2CID 8712839.
- S2CID 13047538.
- ^ S2CID 1179101.
- ^ PMID 22955988.
- PMID 28241135.
- ^ PMID 21890647.
- PMID 16344565.
- PMID 27605307.
- PMID 33045751.
- S2CID 29209966.
- PMID 27081004.
- PMID 34204536.
- PMID 36330950.
- PMID 26578586.
- S2CID 25609839.
- S2CID 6465064.
- PMID 17571346.
- PMID 35931273.
- PMID 33575571.
- PMID 30649200.
- PMID 23335781.
- PMID 27608726.
- ^ PMID 23892401.
- PMID 28053114.
- PMID 21685081.
- PMID 26838501.
- PMID 25640239.
- S2CID 205253245.
- PMID 24407481.
- PMID 22056041.
- ^ PMID 26687005.
- PMID 23810193.
- PMID 19182780.
- PMID 17387145.
- PMID 23710818.
- PMID 24429298.
- PMID 22844254.
- PMID 15851065.
- S2CID 29398526.
- ^ PMID 25674102.
- PMID 25218058.
- PMID 26838501.
- S2CID 13833164.
- PMID 25959816.
- PMID 24184936.
- PMID 27819659.
- S2CID 18441501.
- PMID 19770204.
- PMID 35395170.
- PMID 21112873.
- PMID 25332394.
- ^ PMID 34751395.
- PMID 24931603.
- ^ S2CID 22274894.
- ^ PMID 16705037.
- PMID 12223397.
- S2CID 4307332.
- PMID 18176564.
- PMID 11890990.
- ^ PMID 17785203.
- PMID 37860228.
- PMID 26996597.
- ^ Laure D Bernard, Agnès Dubois, Victor Heurtier, Véronique Fischer, Inma Gonzalez, Almira Chervova, Alexandra Tachtsidi, Noa Gil, Nick Owens, Lawrence E Bates, Sandrine Vandormael-Pournin, José C R Silva, Igor Ulitsky, Michel Cohen-Tannoudji, Pablo Navarro, OCT4 activates a Suv39h1-repressive antisense lncRNA to couple histone H3 Lysine 9 methylation to pluripotency, Nucleic Acids Research, Volume 50, Issue 13, 22 July 2022, Pages 7367–7379, https://doi.org/10.1093/nar/gkac550
- PMID 18509338.
- PMID 28277509.
- PMID 24388749.
- PMID 21047393.
- S2CID 3012142.
- PMID 2434928.
- ^ S2CID 22982547.
- PMID 11604515.
- PMID 14580347.
- PMID 11237011.
- PMID 12466850.
- PMID 7784180.
- ^ S2CID 11997028.
- ^ S2CID 22199826.
- PMID 17307818.
- ^ PMID 18313387.
- PMID 18313380.
- PMID 14505360.
- PMID 18299392.
- S2CID 28643222.
- S2CID 16781978.
- S2CID 4359937.
- S2CID 10513502.
- ^ S2CID 41151259.
- ^ PMID 17977614.
- PMID 17264081.
- PMID 23178169.
- ^ PMID 18347095.
- PMID 1657988.
- PMID 7684772.
- PMID 1706516.
- PMID 9647652.
- PMID 16330711.
- PMID 17699670.
- S2CID 18840384.
- PMID 18691963.
- PMID 18463631.
- ^ PMID 18535243.
- S2CID 23292265.
- ^ PMID 17603471.
- PMID 2911567.
- PMID 18000552.
- S2CID 1768190.
- ^ PMID 17604720.
- ^ S2CID 16423723.
- PMID 32055844.
- S2CID 16059065.
- PMID 32898472.
- PMID 9742080.
- S2CID 34559885.
- ^ PMID 18185590.
- PMID 17445943.
- PMID 16117633.
- PMID 10369866.
- PMID 16702402.
- ^ S2CID 19084498.
- S2CID 4420245.
- PMID 11562346.
- PMID 12456662.
- ^ PMID 17869504.
- PMID 14749728.
- S2CID 205001095.
- PMID 17876321.
- ^ S2CID 5890629.
- ^ S2CID 20693275.
- PMID 23593023.
- PMID 25569254.
- PMID 32144193.
- PMID 33918762.
- PMID 30329098.
- S2CID 39305428.
- PMID 2444875.
- ^ PMID 16569192.
- PMID 34727260.
- S2CID 26576988.
- S2CID 53569324.
- S2CID 9657308.
- S2CID 260632006.
- PMID 15738415.
- PMID 9094986.
- PMID 15855153.
- PMID 17066261.
- PMID 17478681.
- PMID 17440112.
- PMID 18048406.
- ^ PMID 19592466.
- PMID 19343170.
- PMID 15294872.
- PMID 18587408.
- S2CID 7226446.
- PMID 31399544.