Bioinformatics discovery of non-coding RNAs

bioinformatic approaches. Bioinformatic approaches can be divided into three main categories. The first involves homology search, although these techniques are by definition unable to find new classes of ncRNAs. The second category includes algorithms designed to discover specific types of ncRNAs that have similar properties. Finally, some discovery methods are based on very general properties of RNA

, and are thus able to discover entirely new kinds of ncRNAs.

Discovery by homology search

Homology search refers to the process of searching a sequence database for RNAs that are similar to already known RNA sequences. Any algorithm that is designed for homology search of nucleic acid sequences can be used, e.g., BLAST.^[1] However, such algorithms typically are not as sensitive or accurate as algorithms specifically designed for RNA.

Of particular importance for RNA is its conservation of a secondary structure, which can be modeled to achieve additional accuracy in searches. For example, Covariance models^[2] can be viewed as an extension to a profile hidden Markov model that also reflects conserved secondary structure. Covariance models are implemented in the Infernal software package.^[3]

Discovery of specific types of ncRNAs

Some types of RNAs have shared properties that algorithms can exploit. For example, tRNAscan-SE

tRNAs

. The heart of this program is a tRNA homology search based on covariance models, but other tRNA-specific search programs are used to accelerate searches.

The properties of

snoRNAs have enabled the development of programs to detect new examples of snoRNAs, including those that might be only distantly related to previously known examples. Computer programs implementing such approaches include snoscan^[5] and snoReport.^[6]

Similarly, several algorithms have been developed to detect microRNAs. Examples include miRNAFold^[7] and miRNAminer.^[8]

Discovery by general properties

Some properties are shared by multiple unrelated classes of ncRNA, and these properties can be targeted to discover new classes. Chief among them is the conservation of an RNA secondary structure. To measure conservation of secondary structure, it is necessary to somehow find homologous sequences that might exhibit a common structure. Strategies to do this have included the use of BLAST between two sequences

locality sensitive hashing in combination with sequence and structural features.^[13]

Mutations that change the nucleotide sequence, but preserve secondary structure are called covariation, and can provide evidence of conservation. Other statistics and probabilistic models can be used to measure such conservation. The first ncRNA discovery method to use structural conservation was QRNA,^[9] which compared the probabilities of an alignment of two sequences based on either an RNA model or a model in which only the primary sequence conserved. Work in this direction has allowed for more than two sequences and included phylogenetic models, e.g., with EvoFold.^[14] An approach taken in RNAz^[15] involved computing statistics on an input multiple-sequence alignment. Some of these statistics relate to structural conservation, while others measure general properties of the alignment that could affect the expected ranges of the structural statistics. These statistics were combined using a support vector machine.

Other properties include the appearance of a promoter to transcribe the RNA. ncRNAs are also often followed by a Rho-independent transcription terminator.

Using a combination of these approaches, multiple studies have enumerated candidate RNAs, e.g., ^[9]^[12] Some studies have proceeded to manual analysis of the predictions to find a details structural and functional prediction.^[11]^[16]^[17]

References

PMID 9254694
.

PMID 8029015
.

PMID 24008419
.

PMID 9023104
.

S2CID 8084145
.

PMID 17895272
.

PMID 22362754
.

PMID 18215311
.

^
PMID 11801179
.

PMID 19340921
.

^
PMID 17621584
.

^
PMID 19377483
.

PMID 22689765
.

PMID 16628248
.

PMID 15665081
.

PMID 20230605
.

PMID 28977401
.

v
t
e
Bioinformatics
Databases

Sequence databases: GenBank, European Nucleotide Archive, DNA Data Bank of Japan and China National GeneBank

Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource

Other databases:
Gene Ontology

Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, Rat Genome Database, PHI-base, Arabidopsis Information Resource, GISAID and Zebrafish Information Network

Software

BLAST

Bowtie

Clustal

EMBOSS

HMMER

MUSCLE

PANGOLIN

SAMtools

SOAP suite

TopHat

Other

Server:
ExPASy

Rosalind (education platform)

Institutions

Broad Institute

Computational Biology Department
(CBD)

Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI)

Database Center for Life Science (DBCLS)

DNA Data Bank of Japan (DDBJ)

European Bioinformatics Institute (EMBL-EBI)

European Molecular Biology Laboratory (EMBL)

Flatiron Institute

J. Craig Venter Institute (JCVI)

Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG)

US National Center for Biotechnology Information (NCBI)

Japanese Institute of Genetics

Netherlands Bioinformatics Centre (NBIC)

Philippine Genome Center (PGC)

Scripps Research

Swiss Institute of Bioinformatics (SIB)

Wellcome Sanger Institute

Whitehead Institute

Organizations

African Society for Bioinformatics and Computational Biology (ASBCB)

Australia Bioinformatics Resource (EMBL-AR)

European Molecular Biology network (EMBnet)

International Nucleotide Sequence Database Collaboration (INSDC)

International Society for Biocuration (ISB)

International Society for Computational Biology (ISCB)
Student Council (ISCB-SC)

Institute of Genomics and Integrative Biology (CSIR-IGIB)

Japanese Society for Bioinformatics (JSBi)

Meetings

Basel Computational Biology Conference‎ ([BC²])

European Conference on Computational Biology (ECCB)

Intelligent Systems for Molecular Biology (ISMB)

International Conference on Bioinformatics (InCoB)

International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB)

ISCB Africa ASBCB Conference on Bioinformatics

Pacific Symposium on Biocomputing (PSB)

Research in Computational Molecular Biology (RECOMB)

File formats

CRAM format

FASTA format

FASTQ format

NeXML format

Nexus format

Pileup format

SAM format

Stockholm format

VCF format

GFF format

Related topics

Computational biology

List of biobanks

List of biological databases

Molecular phylogenetics

Sequencing

Sequence database

Sequence alignment

Category

Commons

Retrieved from "https://en.wikipedia.org/w/index.php?title=Bioinformatics_discovery_of_non-coding_RNAs&oldid=1189509494"

[1] PMID 9254694
.

[2] PMID 8029015
.

[3] PMID 24008419
.

[4] PMID 9023104
.

[5] S2CID 8084145
.

[6] PMID 17895272
.

[7] PMID 22362754
.

[8] PMID 18215311
.

[QRNA-9] 
PMID 11801179
.

[10] PMID 19340921
.

[22motifs-11] 
PMID 17621584
.

[P5SM-12] 
PMID 19377483
.

[13] PMID 22689765
.

[14] PMID 16628248
.

[15] PMID 15665081
.

[16] PMID 20230605
.

[17] PMID 28977401
.

[1]

[2]

[3]

[5]

[6]

[7]

[8]

[13]

[9]

[14]

[15]

[12]

[11]

[16]

[17]

Discovery by homology search

Discovery of specific types of ncRNAs

Discovery by general properties

See also

References