KEGG
PMID 10592173 | |
Release date | 1995 |
---|---|
Access | |
Website | www |
Web service URL | REST see KEGG API |
Tools | |
Web | KEGG Mapper |
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.
The KEGG database project was initiated in 1995 by Minoru Kanehisa, professor at the Institute for Chemical Research, Kyoto University, under the then ongoing Japanese Human Genome Program.[1][2] Foreseeing the need for a computerized resource that can be used for biological interpretation of genome sequence data, he started developing the KEGG PATHWAY database. It is a collection of manually drawn KEGG pathway maps representing experimental knowledge on metabolism and various other functions of the cell and the organism. Each pathway map contains a network of molecular interactions and reactions and is designed to link genes in the genome to gene products (mostly proteins) in the pathway. This has enabled the analysis called KEGG pathway mapping, whereby the gene content in the genome is compared with the KEGG PATHWAY database to examine which pathways and associated functions are likely to be encoded in the genome.
According to the developers, KEGG is a "computer representation" of the biological system.[3] It integrates building blocks and wiring diagrams of the system—more specifically, genetic building blocks of genes and proteins, chemical building blocks of small molecules and reactions, and wiring diagrams of molecular interaction and reaction networks. This concept is realized in the following databases of KEGG, which are categorized into systems, genomic, chemical, and health information.[4]
- Systems information
- PATHWAY: pathway maps for cellular and organismal functions
- MODULE: modules or functional units of genes
- BRITE: hierarchical classifications of biological entities
- Genomic information
- Chemical information
- COMPOUND, GLYCAN: chemical compounds and glycans
- REACTION, RPAIR, RCLASS: chemical reactions
- ENZYME: enzyme nomenclature
- Health information
- DISEASE: human diseases
- DRUG: approved drugs
- ENVIRON: crude drugs and health-related substances
Databases
Systems information
The KEGG PATHWAY database, the wiring diagram database, is the core of the KEGG resource. It is a collection of pathway maps integrating many entities including genes, proteins, RNAs, chemical compounds, glycans, and chemical reactions, as well as disease genes and drug targets, which are stored as individual entries in the other databases of KEGG. The pathway maps are classified into the following sections:
- Metabolism
- Genetic information processing (, etc.)
- Environmental information processing (membrane transport, signal transduction, etc.)
- Cellular processes (cell growth, cell death, cell membrane functions, etc.)
- Organismal systems (immune system, endocrine system, nervous system, etc.)
- Human diseases
- Drug development
The metabolism section contains aesthetically drawn global maps showing an overall picture of metabolism, in addition to regular metabolic pathway maps. The low-resolution global maps can be used, for example, to compare metabolic capacities of different organisms in genomics studies and different environmental samples in metagenomics studies. In contrast, KEGG modules in the KEGG MODULE database are higher-resolution, localized wiring diagrams, representing tighter functional units within a pathway map, such as subpathways conserved among specific organism groups and molecular complexes. KEGG modules are defined as characteristic gene sets that can be linked to specific metabolic capacities and other phenotypic features, so that they can be used for automatic interpretation of genome and metagenome data.
Another database that supplements KEGG PATHWAY is the KEGG BRITE database. It is an ontology database containing hierarchical classifications of various entities including genes, proteins, organisms, diseases, drugs, and chemical compounds. While KEGG PATHWAY is limited to molecular interactions and reactions of these entities, KEGG BRITE incorporates many different types of relationships.
Genomic information
Several months after the KEGG project was initiated in 1995, the first report of the completely sequenced
These correspondences are made using the concept of
Chemical information
The KEGG metabolic pathway maps are drawn to represent the dual aspects of the metabolic network: the genomic network of how genome-encoded
The databases in the chemical information category, which are collectively called KEGG LIGAND, are organized by capturing knowledge of the chemical network. In the beginning of the KEGG project, KEGG LIGAND consisted of three databases: KEGG COMPOUND for chemical compounds, KEGG REACTION for chemical reactions, and KEGG ENZYME for reactions in the enzyme nomenclature.[7] Currently, there are additional databases: KEGG GLYCAN for glycans[8] and two auxiliary reaction databases called RPAIR (reactant pair alignments) and RCLASS (reaction class).[9] KEGG COMPOUND has also been expanded to contain various compounds such as xenobiotics, in addition to metabolites.
Health information
In KEGG, diseases are viewed as perturbed states of the biological system caused by perturbants of genetic factors and environmental factors, and drugs are viewed as different types of perturbants.[10] The KEGG PATHWAY database includes not only the normal states but also the perturbed states of the biological systems. However, disease pathway maps cannot be drawn for most diseases because molecular mechanisms are not well understood. An alternative approach is taken in the KEGG DISEASE database, which simply catalogs known genetic factors and environmental factors of diseases. These catalogs may eventually lead to more complete wiring diagrams of diseases.
The KEGG DRUG database contains
Subscription model
In July 2011 KEGG introduced a subscription model for FTP download due to a significant cutback of government funding. KEGG continues to be freely available through its website, but the subscription model has raised discussions about sustainability of bioinformatics databases.[11][12]
See also
- Comparative Toxicogenomics Database - CTD integrates KEGG pathways with toxicogenomic and disease data
- ConsensusPathDB, a molecular functional interaction database, integrating information from KEGG
- Gene ontology
- PubMed
- Uniprot
- Gene Disease Database
References
External links
- KEGG ID (P665) (see uses)
- KEGG website
- GenomeNet mirror site
- The entry for KEGG in MetaBase