CATH database
Research center University College London | | |
Laboratory | Institute of Structural and Molecular Biology | |
---|---|---|
Primary citation | Dawson et al. (2016) [1] | |
Release date | 1997 | |
Access | ||
Website | cathdb | |
Download URL | cathdb | |
Miscellaneous | ||
Data release frequency | CATH-B is released daily. Official releases are approximately annual. | |
Version | 4.3 |
The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of
Hierarchical organization
Experimentally determined protein three-dimensional structures are obtained from the
The domains are then classified within the CATH structural hierarchy: at the Class (C) level, domains are assigned according to their secondary structure content, i.e. all alpha, all beta, a mixture of alpha and beta, or little secondary structure; at the Architecture (A) level, information on the secondary structure arrangement in three-dimensional space is used for assignment; at the Topology/fold (T) level, information on how the secondary structure elements are connected and arranged is used; assignments are made to the Homologous superfamily (H) level if there is good evidence that the domains are related by evolution[2] i.e. they are homologous.
# | Level | Description |
---|---|---|
1 | Class | the overall secondary-structure content of the domain. (Equivalent to the SCOP Class) |
2 | Architecture | high structural similarity but no evidence of homology. |
3 | Topology/fold | a large-scale grouping of topologies which share particular structural features (Equivalent to the 'fold' level in SCOP) |
4 | Homologous superfamily | indicative of a demonstrable evolutionary relationship. (Equivalent to SCOP superfamily) |
Additional sequence data for domains with no experimentally determined structures are provided by CATH's sister resource, Gene3D, which are used to populate the homologous superfamilies. Protein sequences from UniProtKB and Ensembl are scanned against CATH HMMs to predict domain sequence boundaries and make homologous superfamily assignments.
Releases
The CATH team aim to provide official releases of the CATH classification every 12 months. This release process is important because it allows for the provision of internal validation, extra annotations and analysis. However, it can mean that there is a time delay between new structures appearing in the PDB and the latest official CATH release,[citation needed]
In order to address this issue: CATH-B provides a limited amount of information to the very latest domain annotations (e.g., domain boundaries and superfamily classifications).
The latest release of CATH-Gene3D (v4.3) was released in December 2020 and consists of:
- 500,238 structural protein domain entries [1]
- 151 mln non-structural protein domain entries [1]
- 5,481 homologous superfamily entries [1]
- 212,872 functional family entries [1]
Open-source software
CATH is an
References
- ^ PMID 27899584.
- ^ PMID 9309224.
- ^ "CATH: Protein Structure Classification Database at UCL". Cathdb.info. Retrieved 9 March 2017.
- ^ "CATH". Cathdb.info. Retrieved 9 March 2017.
- ^ "CATH Database (@CATHDatabase)". Twitter. Retrieved 9 March 2017.
- PMID 12520050.
- ^ "Tools". cathdb.info. Retrieved 18 December 2016.