Semantic Scholar

Source: Wikipedia, the free encyclopedia.
(Redirected from
S2CID (identifier)
Semantic Scholar
Allen Institute for Artificial Intelligence
LaunchedNovember 2015 (2015-11)

Semantic Scholar is an artificial intelligence–powered research tool for scientific literature developed at the Allen Institute for AI and publicly released in November 2015.[1] It uses advances in natural language processing to provide summaries for scholarly papers.[2] The Semantic Scholar team is actively researching the use of artificial-intelligence in natural language processing, machine learning, Human-Computer interaction, and information retrieval.[3]

Semantic Scholar began as a database surrounding the topics of

biomedical literature in its corpus.[4] As of September 2022, they now include over 200 million publications from all fields of science.[5]


Semantic Scholar provides a one-sentence summary of scientific literature. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices.[6] It also seeks to ensure that the three million scientific papers published yearly reach readers, since it is estimated that only half of this literature are ever read.[7]

Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique.[2] The project uses a combination of machine learning, natural language processing, and machine vision to add a layer of semantic analysis to the traditional methods of citation analysis, and to extract relevant figures, tables, entities, and venues from papers.[8][9]

In contrast with Google Scholar and PubMed, Semantic Scholar is designed to highlight the most important and influential elements of a paper.[10] The AI technology is designed to identify hidden connections and links between research topics.[11] Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the Microsoft Academic Knowledge Graph, Springer Nature's SciGraph, and the Semantic Scholar Corpus.[12]

Each paper hosted by Semantic Scholar is assigned a unique identifier called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example:

Liu, Ying; Gayle, Albert A; Wilder-Smith, Annelies; Rocklöv, Joacim (March 2020). "The reproductive number of COVID-19 is higher compared to SARS coronavirus". Journal of Travel Medicine. 27 (2).
S2CID 211099356

Semantic Scholar is free to use and unlike similar search engines (i.e. Google Scholar) does not search for material that is behind a paywall.[13][4]

One study compared the search abilities of Semantic Scholar through a systematic approach, and found the search engine to be 98.88% accurate when attempting to uncover the data.[13] The same study examined other Semantic Scholar functions, including tools to survey metadata as well as several citation tools.[13]

Number of users and publications

As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from

Microsoft Academic Graph records.[17] In 2020, a partnership between Semantic Scholar and the University of Chicago Press Journals made all articles published under the University of Chicago Press available in the Semantic Scholar corpus.[18] At the end of 2020, Semantic Scholar had indexed 190 million papers.[19]

In 2020, users of Semantic Scholar reached seven million a month.[6]

See also


  1. ^ Eunjung Cha, Ariana (3 November 2015). "Paul Allen's AI research group unveils program that aims to shake up how we search scientific knowledge. Give it a try". The Washington Post. Archived from the original on 6 November 2019. Retrieved November 3, 2015.
  2. ^ a b Hao, Karen (November 18, 2020). "An AI helps you summarize the latest in AI". MIT Technology Review. Retrieved 2021-02-16.
  3. ^ "Semantic Scholar Research". Retrieved 2021-11-22.
  4. ^
    S2CID 45802944
  5. ^ Matthews, David (1 September 2021). "Drowning in the literature? These smart software tools can help". Nature. Retrieved 5 September 2022. ...the publicly available corpus compiled by Semantic Scholar — a tool set up in 2015 by the Allen Institute for Artificial Intelligence in Seattle, Washington — amounting to around 200 million articles, including preprints.
  6. ^ a b Grad, Peter (November 24, 2020). "AI tool summarizes lengthy papers in a sentence". Tech Xplore. Retrieved 2021-02-16.
  7. ^ "Allen Institute's Semantic Scholar now searches across 175 million academic papers". VentureBeat. 2019-10-23. Retrieved 2021-02-16.
  8. from the original on 29 April 2020. Retrieved 12 November 2016.
  9. .
  10. ^ "Semantic Scholar". International Journal of Language and Literary Studies. Retrieved 2021-11-09.
  11. .
  12. .
  13. ^ .
  14. ^ "AI2 scales up Semantic Scholar search engine to encompass biomedical research". GeekWire. 2017-10-17. Archived from the original on 2018-01-19. Retrieved 2018-01-18.
  15. ^ "Tech Moves: Allen Instititue Hires Amazon Alexa Machine Learning Leader; Microsoft Chairman Takes on New Investor Role; and More". GeekWire. 2018-05-02. Archived from the original on 2018-05-10. Retrieved 2018-05-09.
  16. ^ "Semantic Scholar". Semantic Scholar. Archived from the original on 11 August 2019. Retrieved 11 August 2019.
  17. ^ "AI2 joins forces with Microsoft Research to upgrade search tools for scientific studies". GeekWire. 2018-12-05. Archived from the original on 2019-08-25. Retrieved 2019-08-25.
  18. ^ "The University of Chicago Press joins more than 500 publishers working with Semantic Scholar to improve search and discoverability". RCNi Company Limited. Retrieved 2021-11-22.
  19. ^ Dunn, Adriana (December 14, 2020). "Semantic Scholar Adds 25 Million Scientific Papers in 2020 Through New Publisher Partnerships" (PDF). Semantic Scholar. Retrieved November 22, 2021.

External links