Sequence profiling tool
This article includes a improve this article by introducing more precise citations. (October 2017) ) |
A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.
Introduction and usage
The "post-
In general, there exist three types of databases and service providers. The first one includes the popular public-domain or open-access databases supported by funding and grants such as
Typical scenarios of a profiling approach become relevant, particularly, in the cases of the first two groups, where researchers commonly wish to combine information derived from several sources about a single query or target sequence. For example, users might use the sequence alignment and search tool
Many public databases are already extensively linked so that complementary information in another database is easily accessible; for example,
Keyword based profilers
Most of the profiling tools available on the web today fall into this category. The user, upon visiting the site/tool, enters any relevant information like a keyword e.g. dystrophy, diabetes etc., or GenBank accession numbers, PDB ID. All the relevant hits by the search are presented in a format unique to each tool's main focus. Profiling tools based on keyword searches are essentially search engines that are highly specialized for bioinformatics work, thereby eliminating a clutter of irrelevant or non-scholarly hits that might occur with a traditional search engine like Google. Most keyword-based profiling tools allow flexible types of keyword input, accession numbers from indexed databases as well as traditional keyword descriptors.
Each profiling tool has its own focus and area of interest. For example, the NCBI search engine Entrez segregates its hits by category, so that users looking for protein structure information can screen out sequences with no corresponding structure, while users interested in perusing the literature on a subject can view abstracts of papers published in scholarly journals without distraction from gene or sequence results. The PubMed biosciences literature database is a popular tool for literature searches, though this service is nearly equaled with the more general Google Scholar.
Keyword-based data aggregation services like the
Sequence data based profilers
A typical sequence profiling tool carries this further by using an actual DNA, RNA, or protein sequence as an input and allows the user to visit different web-based analysis tools to obtain the information desired. Such tools are also commonly supplied with commercial laboratory equipment like gene sequencers or sometimes sold as software applications for molecular biology. In another public-database example, the BLAST sequence search report from NCBI provides a link from its alignment report to other relevant information in its own databases, if such specific information exists.
For example, a retrieved record that contains a human sequence will carry a separate link that connects to its location on a human genome map; a record that contains a sequence for which a 3-D structure has been solved would carry a link that connects it to its structure database.
As a result, the user can end up with a privately hosted document or a page from a lesser known database from just about anywhere in the world. Though the presence of sequence based profilers are far and few in the present scenario, their key role will become evident when huge amounts of sequence data need to be cross processed across portals and domains.
Future growth and directions
The proliferation of bioinformatics tools for genetic analysis aids researchers in identifying and categorizing genes and gene sets of interest in their work; however, the large variety of tools that perform substantially similar aggregative and analytical functions can also confuse and frustrate new users. The decentralization encouraged by aggregative tools allows individual research groups to maintain specialized servers dedicated to specific types of data analysis in the expectation that their output will be collected into a larger report on a gene or protein of interest to other researchers.
Data produced by microarray experiments, two-hybrid screening, and other high-throughput biological experiments is voluminous and difficult to analyze by hand; the efforts of structural genomics collaborations that are aimed at quickly solving large numbers of highly varied protein structures also increase the need for integration between sequence and structure databases and portals. This impetus toward developing more comprehensive and more user-friendly methods of sequence profiling makes this an active area of research among current genomics researchers.
See also
References
- Peri S, Navarro JD, Kristiansen TZ, et al. (January 2004). "Human protein reference database as a discovery resource for proteomics". Nucleic Acids Res. 32 (Database issue): D497–501. PMID 14681466.
- Liebel U; Kindler B; Pepperkok R (August 2004). "'Harvester': a fast meta search engine of human protein resources". Bioinformatics. 20 (12): 1962–3. PMID 14988114.
- Ganesan N; Bennett NF; Velauthapillai M; Pattabiraman N; Squier R; Kalyanasundaram B (August 2005). "Web-based interface facilitating sequence-to-structure analysis of BLAST alignment reports". BioTechniques. 39 (2): 186, 188. PMID 16116790.
- Beaton J; Smith C (November 2005). "Google versus PubMed". Ann R Coll Surg Engl. 87 (6): 491–2. PMID 16263030.
- Hunter L; Cohen KB (March 2006). "Biomedical language processing: what's beyond PubMed?". Mol. Cell. 21 (5): 589–94. PMID 16507357.
- Ganesan N; Kalyanasundaram B; Velauthapillai M (March 2007). "Bioinformatics data profiling tools: a prelude to metabolic profiling". Pac. Symp. Biocomput.: 127–32. PMID 17990486.