Rfam
PMID 33211869 | |
Access | |
---|---|
Data format | Stockholm format |
Website | rfam |
Download URL | FTP |
Miscellaneous | |
License | Public domain |
Bookmarkable entities | yes |
Rfam is a
Unlike
Uses
The Rfam database can be used for a variety of functions. For each ncRNA family, the interface allows users to: view and download multiple sequence alignments; read annotation; and examine species distribution of family members. There are also links provided to literature references and other RNA databases. Rfam also provides links to Wikipedia so that entries can be created or edited by users.
The interface at the Rfam website allows users to search ncRNAs by keyword, family name, or genome as well as to search by ncRNA sequence or
Methods
In the database, the information of the
The first MSA is the "seed" alignment. It is a hand-curated alignment that contains representative members of the ncRNA family and is annotated with structural information. This seed alignment is used to create the SCFG, which is used with the Rfam software INFERNAL to identify additional family members and add them to the alignment. A family-specific threshold value is chosen to avoid false positives.
Until release 12, Rfam used an initial BLAST filtering step because profile SCFGs were too computationally expensive. However, the latest versions of INFERNAL are fast enough[10] so that the BLAST step is no longer necessary.[11]
The second MSA is the “full” alignment, and is created as a result of a search using the covariance model against the sequence database. All detected homologs are aligned to the model, giving the automatically produced full alignment.
History
Version 1.0 of Rfam was launched in 2003 and contained 25 ncRNA families and annotated about 50 000 ncRNA genes. In 2005, version 6.1 was released and contained 379 families annotating over 280 000 genes. In August 2012, version 11.0 contained 2208 RNA families, while the current version (14.9, released in November 2022) annotates 4108[7] families.
Major releases and publications
- 2003 - Rfam: an RNA family database.[1]
- 2005 - Rfam: annotating non-coding RNAs in complete genomes.[2]
- 2008 - The RNA WikiProject: community annotation of RNA families.[6]
- 2008 - Rfam: updates to the RNA families database.[3]
- 2011 - Rfam: Wikipedia, clans and the “decimal” release.[4]
- 2012 - Rfam 11.0: 10 years of RNA families.[12]
- 2014 - Rfam 12.0: updates to the RNA families database. [3]
- 2017 - Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families.[13]
- 2020 - Rfam 14: expanded coverage of metagenomic, viral and microRNA families.[14]
Problems
- The genomes of higher eukaryotes contain many ncRNA-derived pseudogenes and repeats. Distinguishing these non-functional copies from functional ncRNA is a formidable challenge.[2]
- Introns are not modeled by covariance models.