Bioinformatics discovery of non-coding RNAs
Discovery by homology search
Homology search refers to the process of searching a sequence database for RNAs that are similar to already known RNA sequences. Any algorithm that is designed for homology search of nucleic acid sequences can be used, e.g., BLAST.[1] However, such algorithms typically are not as sensitive or accurate as algorithms specifically designed for RNA.
Of particular importance for RNA is its conservation of a secondary structure, which can be modeled to achieve additional accuracy in searches. For example, Covariance models[2] can be viewed as an extension to a profile hidden Markov model that also reflects conserved secondary structure. Covariance models are implemented in the Infernal software package.[3]
Discovery of specific types of ncRNAs
Some types of RNAs have shared properties that algorithms can exploit. For example, tRNAscan-SE
The properties of
Similarly, several algorithms have been developed to detect microRNAs. Examples include miRNAFold[7] and miRNAminer.[8]
Discovery by general properties
Some properties are shared by multiple unrelated classes of ncRNA, and these properties can be targeted to discover new classes. Chief among them is the conservation of an RNA secondary structure. To measure conservation of secondary structure, it is necessary to somehow find homologous sequences that might exhibit a common structure. Strategies to do this have included the use of BLAST between two sequences
Mutations that change the nucleotide sequence, but preserve secondary structure are called covariation, and can provide evidence of conservation. Other statistics and probabilistic models can be used to measure such conservation. The first ncRNA discovery method to use structural conservation was QRNA,[9] which compared the probabilities of an alignment of two sequences based on either an RNA model or a model in which only the primary sequence conserved. Work in this direction has allowed for more than two sequences and included phylogenetic models, e.g., with EvoFold.[14] An approach taken in RNAz[15] involved computing statistics on an input multiple-sequence alignment. Some of these statistics relate to structural conservation, while others measure general properties of the alignment that could affect the expected ranges of the structural statistics. These statistics were combined using a support vector machine.
Other properties include the appearance of a promoter to transcribe the RNA. ncRNAs are also often followed by a Rho-independent transcription terminator.
Using a combination of these approaches, multiple studies have enumerated candidate RNAs, e.g., [9][12] Some studies have proceeded to manual analysis of the predictions to find a details structural and functional prediction.[11][16][17]
See also
- 6A RNA motif
- AbiF RNA motif
- ARRPOF RNA motif
- CyVA-1 RNA motif
- List of RNA structure prediction software