Cambridge Structural Database

Source: Wikipedia, the free encyclopedia.
Cambridge Structural Database
Research center
Cambridge Crystallographic Data Centre
Access
Data format.cif
Website
Standalone
  • CSD System
  • CSD (the database)
  • ConQuest
  • Mercury
  • IsoStar
  • Mogul
  • GOLD
  • CSD-CrossMiner
  • The Cambridge Structural Database (CSD) is both a repository and a validated and curated resource for the three-dimensional structural data of

    organometallic molecules. The specific entries are complementary to the other crystallographic databases such as the Protein Data Bank (PDB), Inorganic Crystal Structure Database and International Centre for Diffraction Data. The data, typically obtained by X-ray crystallography and less frequently by electron diffraction or neutron diffraction, and submitted by crystallographers and chemists from around the world, are freely accessible (as deposited by authors) on the Internet via the CSD's parent organization's website (CCDC, Repository[1]). The CSD is overseen by the not-for-profit incorporated company called the Cambridge Crystallographic Data Centre
    , CCDC.

    The inside of the CCDC headquarters Cambridge, UK

    The CSD is a widely used repository for small-molecule organic and metal-organic crystal structures for scientists. Structures deposited with Cambridge Crystallographic Data Centre (CCDC) are publicly available for download at the point of publication or at consent from the depositor. They are also scientifically enriched and included in the database used by software offered by the centre. Targeted subsets of the CSD are also freely available to support teaching and other activities.[2]

    History

    The CCDC grew out of the activities of the crystallography group led by Olga Kennard OBE FRS in the Department of Organic, Inorganic and Theoretical Chemistry of the University of Cambridge. From 1965, the group began to collect published bibliographic, chemical and crystal structure data for all small molecules studied by X-ray or neutron diffraction. With the rapid developments in computing taking place at this time, this collection was encoded in electronic form and became known as the Cambridge Structural Database (CSD).

    The CSD was one of the first numerical scientific databases to begin operations anywhere in the world, and received academic grants from the UK Office for Scientific and Technical Information and then from the UK Science and Engineering Research Council. These funds, together with subventions from National Affiliated Centres, enabled the development of the CSD and its associated software during the 1970s and 1980s. The first releases of the CSD System to the United States, Italy and Japan occurred in the early 1970s. By the early 1980s the CSD System was being distributed in more than 30 countries. As of 2014, the CSD System was distributed to academics in 70 countries.

    During the 1980s, interest in the CSD System from pharmaceutical and agrochemicals companies increased significantly. This led to the establishment of the Cambridge Crystallographic Data Centre (CCDC) as an independent company in 1987, with the legal status of a non-profit charitable institution, and with its operations overseen by an international board of governors. The CCDC moved into purpose-built premises on the site of the University Department of Chemistry in 1992.

    Kennard retired as Director in 1997 and was succeeded by David Hartley (1997-2002) and Frank Allen (2002-2008). Colin Groom was appointed as executive director from 1 October 2008[3] to September 2017.[4] And most recently, Juergen Harter was appointed CEO in June 2018.[5]

    CCDC software products diversified to the use of crystallographic data in applications in the life sciences and crystallography. Much of this software development and marketing is carried out by CCDC Software Limited (founded in 1998), a wholly owned subsidiary which covenants all of its profits back to the CCDC.

    Although the CCDC is a self-administering organization, it retains close links with the University of Cambridge, and is a University Partner Institution that is qualified to train postgraduate students for higher degrees (PhD, MPhil).

    The CCDC established US applications and support operations in the US in October 2013,[6][7] initially at Rutgers, the State University of New Jersey, where it is co-located with the RCSB Protein Data Bank

    Contents

    One Millionth Structure Added to CSD
    One Millionth Structure Added to CSD, CSD ID: XOPCAJ

    The CSD is updated with about 50,000 new structures each year,[8] and with improvements to existing entries. Entries (structures) in the repository are released for public access as soon as the corresponding entry has appeared in the peer-reviewed scientific literature. Meanwhile, data can also be deposited and published directly through the CSD without an accompanying scientific article as what is known as a CSD Communication.

    Periodically, general statistics about the breadth of CSD holdings are reported, for example the January 2014 report.[9] As of January 2019, the summary statistics are as follows:[10]

    Query structures % of CSD
    Total # of structures 995,907 100.0
    # of different compounds 900,984 -
    # of literature sources 2,004 -
    Organic structures 431,037 43.5
    Transition metal present 478,138 48.2
    alkali or alkaline earth metal present 48,056 4.8
    main group metal present 101,948 10.3
    3D coordinates present 937,809 94.6
    Error-free coordinates 926,422 98.81
    Neutron studies 2,142 0.2
    Powder diffraction studies 4,761 0.5
    Low/high temp. studies 503,368 50.8
    Absolute configuration determined 28,834 2.9
    Disorder present in structure 256,019 25.8
    Polymorphic structures 29,817 3.0
    R-factor < 0.100 935,419 94.4
    R-factor < 0.075 845,708 85.3
    R-factor < 0.050 553,042 55.8
    R-factor < 0.030 121,806 12.3
    No. of atoms with 3D coordinates 85,791,623 -

    As of January 2019, the top 25 scientific journals in terms of publication of structures in the CSD repository were:[11]

    1. 73,070 structures were reported in Inorg. Chem.
    2. 62,072 structures were reported in Dalton & J. Chem. Soc., Dalton Trans.
    3. 54,160 structures were reported in Organometallics
    4. 48,967 structures were reported in J. Am. Chem. Soc.
    5. 42,422 structures were reported in Acta Crystallogr. Sect. E
    6. 32,610 structures were reported in Chem. Eur. J.
    7. 29,790 structures were reported in J. Organomet. Chem.
    8. 29,640 structures were reported in Angew. Chem. Int. Ed.
    9. 28,682 structures were reported in Inorg. Chim. Acta
    10. 28,351 structures were reported in
    Chem. Commun. & J. Chem. Soc.
    11. 27,328 structures were reported in CSD Communications
    12. 26,774 structures were reported in Acta Crystallogr. Sect. C
    13. 26,734 structures were reported in Polyhedron
    14. 24,045 structures were reported in Eur. J. Inorg. Chem.
    15. 23,483 structures were reported in
    J. Org. Chem.
    16. 22,286 structures were reported in Cryst. Growth Des.
    17. 22,011 structures were reported in CrystEngComm
    18. 15,985 structures were reported in Organic Letters
    19. 15,424 structures were reported in Z. Anorg. Allg. Chem.
    20. 14,864 structures were reported in Acta Crystallogr. Sect. B
    21. 13,909 structures were reported in Tetrahedron 8,597 structures were reported as Private Communication to the CSD
    22. 12,734 structures were reported in J. Mol. Struct.
    23. 11,234 structures were reported in Tetrahedron Lett.
    24. 9,150 structures were reported in Eur. J. Org. Chem.
    25. 8,789 structures were reported in New Journal of Chemistry

    These 25 journals account for 704,541 of the 996,193 or 70.7% of the structures in the CSD.

    These data show that most structures are determined by X-ray diffraction, with less than 1% of structures being determined by neutron diffraction or powder diffraction. The number of error-free coordinates were taken as a percentage of structures for which 3D coordinates are present in the CSD.

    The significance of the structure factor files, mentioned above, is that, for CSD structures determined by X-ray diffraction that have a structure file, a crystallographer can verify the interpretation of the observed measurements.

    Growth trend

    Historically, the number of structures in the CSD has grown at an approximately exponential rate passing the 25,000 structures milestone in 1977, the 50,000 structures milestone in 1983, the 125,000 structures milestone in 1992, the 250,000 structures milestone in 2001, the 500,000 structures milestone in 2009,[12][13][14] and the 1,000,000 structures milestone on June 8, 2019.[15] The one millionth structure added to CSD is the crystal structure of 1-(7,9-diacetyl-11-methyl-6H-azepino[1,2-a]indol-6-yl)propan-2-one.

    Growth Trend of Structure in CSD from 1965 - 2018[11]
    Number of published structures per year
    Year # published Total
    2018 53429 974,653
    2017 55031 921,224
    2016 54975 866,193
    2015 53610 811,218
    2014 50759 757,608
    2013 48025 706,849
    2012 45199 661,121
    2011 43882 615,922
    2010 41240 572,040
    2009 40627 530,800
    2008 36802 490,173
    2007 36569 453,371
    2006 34713 416,802
    2005 31733 382,089
    2004 27988 350,356
    2003 26287 322,368
    2002 24306 296,081
    2001 21781 271,775
    2000 19998 249,994
    1999 18780 229,996
    1998 17289 211,216
    1997 15896 193,927
    1996 15487 178,031
    1995 13001 162,544
    1994 12290 149,543
    1993 12032 137,253
    1992 10691 125,221
    1991 9941 114,530
    1990 8935 104,589
    1989 7750 95,654
    1988 7644 87,904
    1987 7472 80,260
    1986 6873 72,788
    1985 6911 65,915
    1984 6511 59,004
    1983 5250 52,493
    1982 5233 47,243
    1981 4666 42,010
    1980 4252 37,344
    1979 3876 33,092
    1978 3415 29,216
    1977 3092 25,801
    1976 2735 22,709
    1975 2171 19,974
    1974 2142 17,803
    1973 1991 15,661
    1972 1969 13,670
    1971 1548 11,701
    1970 1261 10,153
    1969 1130 8,892
    1968 975 7,762
    1967 936 6,787
    1966 683 5,851
    1965 656 5,168
    1923-1964 4512 4,512

    Note: data for 1923-1964 are aggregated together in the last line of the table.

    File format

    3D printed model of Benzoic Acid
    3D printed model of Benzoic Acid, taken from a crystal structure determination, created using coordinates from the Cambridge Structural Database, and via the CCDC program Mercury. The top model shows a single molecule of benzoic acid. The bottom model shows a hydrogen-bonded dimer.

    The primary file format for CSD structure deposition, adopted around 1991, is the "Crystallographic Information file" format, CIF.[16]

    The deposited CSD files can be downloaded in the CIF format. The validated and curated CSD files can be exported in a wide range of formats, including CIF, MOL, Mol2, PDB, SHELX and XMol, using tools in the CSD System.

    The CCDC uses two different codes to distinguish between the deposited dataset and the curated CSD entry. For example, one specific ‘CSD Communication’ of an organic molecule was deposited with the CCDC and assigned the deposition number 'CCDC-991327.' This allows free public access to the data as deposited. From the deposited data, selected information is extracted to prepare the validated and curated CSD entry which was assigned the refcode 'MITGUT'. As a part of the curation process, CCDC also applies an algorithm, DeCIFer, to help the editors assign chemistry to structures when those representations (e.g. bond types and charge assignments etc.) are missing from the original CIF files submitted.[8] The validated and curated entry is included in the CSD System and WebCSD distributions, with availability restricted to those making appropriate contributions.

    Viewing the data

    3D printed model of 1-methyl-2,3,4,5-tetrakis((trimethylsilyl)ethynyl)-1H-pyrrole structure. CSD Identifier: XURZAN

    Each data set in CSD can be openly viewed and retrieved using the free Access Structure service. Through this web-browser based service, users can view the data set in 2D and 3D, obtain some basic information about the structure, and download the deposited data set. More advanced search functions and curated information are available through the subscription based CSD system.

    Besides using the CSD system, the structure files may be viewed using one of

    several open source computer programs such as Jmol. Some other free, but not open source programs include MDL Chime, Pymol, UCSF Chimera, Rasmol, WINGX,[17] the CCDC provides a free version of its visualization program Mercury
    .

    Starting from 2015, Mercury from CCDC also provides the functionality to generate 3D print ready file from structures in CSD.[18]

    See also

    References

    1. ^ "CCDC CIF Depository Request Form". Cambridge Crystallographic Data Centre. Retrieved 2014-09-16.
    2. ^ "CCDC Homepage". Cambridge Crystallographic Data Centre. Retrieved 2014-09-16.
    3. PMID 19421719
      .
    4. ^ "Announcement from the Chair, on behalf of Trustees". The Cambridge Crystallographic Data Centre. September 11, 2017. Retrieved 2019-05-15.
    5. ^ "The CCDC welcomes Jürgen Harter as CEO". The Cambridge Crystallographic Data Centre (CCDC). June 11, 2018. Retrieved 2019-05-15.
    6. ^ "CCDC opens US operations". The Cambridge Crystallographic Data Centre (CCDC). October 30, 2013. Retrieved 2019-05-15.
    7. ^ "The Cambridge Crystallographic Data Centre Establishes U.S. Operations in New Partnership with Rutgers' Center for Integrative Proteomics Research". Rutgers Office of Research and Economic Development. Retrieved May 15, 2019.
    8. ^
      PMID 25091065
      .
    9. ^ "CSD Entries: Summary Statistics" (PDF). Cambridge Crystallographic Data Centre. Archived from the original (PDF) on 2014-06-11. Retrieved 2014-09-16.
    10. ^ "CSD Entries: Summary Statistics" (PDF). Cambridge Structural Database. January 1, 2019. Retrieved May 15, 2019.
    11. ^ a b "CSD Journal Statistics" (PDF). Cambridge Structural Database. January 1, 2019. Retrieved May 16, 2019.
    12. PMID 24382699
      .
    13. ^ "Growth of the Cambridge Structural Database (CSD) since 1970". CCDC. Retrieved 2014-09-16.
    14. ^ "CSD Statistics". The Cambridge Crystallographic Data Centre (CCDC). Retrieved 2019-05-17.
    15. ^ Robinson, Philip; Withers, Neil; Pink, Chris; Valsler, Ben. "The Cambridge Structural Database hits one million structures". Chemistry World. Retrieved 2019-06-07.
    16. .
    17. .
    18. ^ "3D Printing: Easy as 1, 2, 3!". The Cambridge Crystallographic Data Centre (CCDC). August 19, 2015. Retrieved 2019-05-18.

    External links