|Stephen Sherry |
(since Sept 26, 2022)
|United States National Library of Medicine|
|Affiliations||National Institutes of Health|
The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.
The NCBI houses a series of databases relevant to
NCBI had responsibility for making available the GenBank
Since 1992, NCBI has grown to provide other databases in addition to GenBank. NCBI provides the Gene database, Online Mendelian Inheritance in Man, the Molecular Modeling Database (3D protein structures), dbSNP (a database of single-nucleotide polymorphisms), the Reference Sequence Collection, a map of the human genome, and a taxonomy browser, and coordinates with the National Cancer Institute to provide the Cancer Genome Anatomy Project. The NCBI assigns a unique identifier (taxonomy ID number) to each species of organism.
The NCBI has software tools that are available through internet browsers or by
The NCBI Bookshelf is a collection of freely accessible, downloadable, online versions of selected biomedical books. The Bookshelf covers a wide range of topics including molecular biology, biochemistry, cell biology, genetics, microbiology, disease states from a molecular and cellular point of view, research methods, and virology. Some of the books are online versions of previously published books, while others, such as Coffee Break, are written and edited by NCBI staff. The Bookshelf is a complement to the Entrez PubMed repository of peer-reviewed publication abstracts in that Bookshelf contents provide established perspectives on evolving areas of study and a context in which many disparate individual pieces of reported research can be organized.
Basic Local Alignment Search Tool (BLAST)
The Entrez Global Query Cross-Database Search System is used at NCBI for all the major databases such as Nucleotide and Protein Sequences, Protein Structures, PubMed, Taxonomy, Complete Genomes, OMIM, and several others. Entrez is both an indexing and retrieval system having data from various sources for biomedical research. NCBI distributed the first version of Entrez in 1991, composed of nucleotide sequences from PDB and GenBank, protein sequences from SWISS-PROT, translated GenBank, PIR, PRF, PDB, and associated abstracts and citations from PubMed. Entrez is specially designed to integrate the data from several different sources, databases, and formats into a uniform information model and retrieval system which can efficiently retrieve that relevant references, sequences, and structures.
Gene has been implemented at NCBI to characterize and organize the information about genes. It serves as a major node in the nexus of the genomic map, expression, sequence, protein function, structure, and homology data. A unique GeneID is assigned to each gene record that can be followed through revision cycles. Gene records for known or predicted genes are established here and are demarcated by map positions or nucleotide sequences. Gene has several advantages over its predecessor, LocusLink, including, better integration with other databases in NCBI, broader taxonomic scope, and enhanced options for query and retrieval provided by the Entrez system.
Protein database maintains the text record for individual protein sequences, derived from many different resources such as NCBI Reference Sequence (RefSeq) project, GenBank, PDB, and UniProtKB/SWISS-Prot. Protein records are present in different formats including FASTA and XML and are linked to other NCBI resources. Protein provides the relevant data to the users such as genes, DNA/RNA sequences, biological pathways, expression and variation data, and literature. It also provides the pre-determined sets of similar and identical proteins for each sequence as computed by the BLAST. The Structure database of NCBI contains 3D coordinate sets for experimentally-determined structures in PDB that are imported by NCBI. The Conserved Domain database (CDD) of protein contains sequence profiles that characterize highly conserved domains within protein sequences. It also has records from external resources like SMART and Pfam. There is another database of proteins known as Protein Clusters database, which contains sets of proteins sequences that are clustered according to the maximum alignments between the individual sequences as calculated by BLAST.
- "The Human Genome Project". The New York Times.
- "Research Institute Posts Gene Data on Internet". The New York Times. June 26, 1997.
- "Sense from Sequences: Stephen F. Altschul on Bettering BLAST". 2000. Archived from the original on 2007-10-07.
- Mizrachi, Ilene (22 August 2007). GenBank: The Nucleotide Sequence Database. National Center for Biotechnology Information (US) – via www.ncbi.nlm.nih.gov.
- "Home - Taxonomy - NCBI". www.ncbi.nlm.nih.gov.
- USA (2019-05-06). "Home - Books - NCBI". Ncbi.nlm.nih.gov. Retrieved 2019-06-12.
- Madden T. (2002). The NCBI Handbook, 2nd edition, Chapter 16, The BLAST Sequence Analysis Tool
- NCBI Resource Coordinators (2012). "Database resources of the National Center for Biotechnology Information". Nucleic Acids Research 41 (Database issue): D8–D20.
- Ostell J. (2002). The NCBI Handbook, 2nd edition, Chapter 15, The Entrez Search and Retrieval System
- Maglott D., Pruitt K. & Tatusova T. (2005). The NCBI Handbook, 2nd edition, Chapter 19, Gene: A Directory of Genes
- Sayers E. (2013). The NCBI Handbook, 2nd edition, NCBI Protein Resources
- Wang Y. & Bryant S H. (2014). The NCBI Handbook, 2nd edition, NCBI PubChem BioAssay Database