Wikidata

Source: Wikipedia, the free encyclopedia.
Wikidata
Logo of Wikidata, a bar code with red, green, and blue stripes
Screenshot
Wikidata main page screenshot.png
Main page of Wikidata in April 2021
Type of site
Available inMultiple languages
OwnerWikimedia Foundation
EditorWikimedia community
URLwww.wikidata.org Edit this at Wikidata
CommercialNo
RegistrationOptional
Launched29 October 2012; 10 years ago (2012-10-29)[1][2]

Wikidata is a

CC0 public domain license. Wikidata is a wiki powered by the software MediaWiki, and is also powered by the set of knowledge graph MediaWiki extensions known as Wikibase
.

Concept

Wikidata is a document-oriented database, focused on items, which represent any kind of topic, concept, or object. Each item is allocated a unique, persistent identifier, a positive integer prefixed with the upper-case letter Q[why?], known as a "QID". This enables the basic information required to identify the topic that the item covers to be translated without favouring any language.

Examples of items include 1988 Summer Olympics (Q8470), love (Q316), Johnny Cash (Q42775), Elvis Presley (Q303), and Gorilla (Q36611).

Item labels need not be unique. For example, there are two items named "Elvis Presley": Elvis Presley (Q303), which represents the American singer and actor, and Elvis Presley (Q610926), which represents his self-titled album. However, the combination of a label and its description must be unique. To avoid ambiguity, an item's unique identifier (QID) is therefore linked to this combination.

Main parts

Fundamentally, an item consists of:

  • Obligatorily, an identifier (the QID), related to a label and a description.
  • Optionally, multiple aliases and some number of statements (and their properties and values).

Statements

Wikidata screenshot
Three statements from Wikidata's item on the planet Mars (Q111). Values include links to other items and to Wikimedia Commons
.

Statements are how any information known about an item is recorded in Wikidata. Formally, they consist of

Sir Arthur Conan Doyle" or "1902"). For example, the informal English statement "milk is white" would be encoded by a statement pairing the property color (P462) with the value white (Q23444) under the item milk (Q8495)
.

Statements may map a property to more than one value. For example, the "occupation" property for Marie Curie could be linked with the values "physicist" and "chemist", to reflect the fact that she engaged in both occupations.[6]

Values may take on many types including other Wikidata items, strings, numbers, or media files. Properties prescribe what types of values they may be paired with. For example, the property official website (P856) may only be paired with values of type "URL".[7]

Optionally, qualifiers can be used to refine the meaning of a statement by providing additional information. For example, a "population" statement could be modified with a qualifier such as "point in time (P585): 2011" (as its own key-value pair). Values in the statements may also be annotated with references, pointing to a source backing up the statement's content.[8] As with statements, all qualifiers and references are property–value pairs.

Properties

Each property has a numeric identifier prefixed with a capital P and a page on Wikidata with optional label, description, aliases, and statements. As such, there are properties with the sole purpose of describing other properties, such as subproperty of (P1647).

Properties may also define more complex rules about their intended usage, termed constraints. For example, the capital (P36) property includes a "single value constraint", reflecting the reality that (typically) territories have only one capital city. Constraints are treated as testing alerts and hints, rather than inviolable rules.[9]

Before a new property is created, it needs to undergo a discussion process.[10][11]

The most used property is cites work (P2860), which is used on more than 280,000,000 item pages as of January 2023.[12]

Lexemes

In linguistics, a lexeme is a unit of lexical meaning. Similarly, Wikidata's lexemes are items with a structure that makes them more suitable to store lexicographical data. Besides storing the language to which the lexeme refers, they have a section for forms and a section for senses.[13]

EntitySchemas

In January 2019 development started of a new extension for MediaWiki to enable storing Shape Expressions in a separate namespace.[14][15]

This extension has since been installed on Wikidata[16] and enables contributors to use Shape Expressions for validating and describing Resource Description Framework data in items and lexemes. Any item or lexeme on Wikidata can be validated against an Entity Schema, and this makes it an important tool for quality assurance.

Development

The creation of the project was funded by donations from the

Wikimedia Deutschland under the management of Lydia Pintscher, and was originally split into three phases:[19]

  1. Centralising interlanguage links – links between Wikipedia articles about the same topic in different languages.
  2. Providing a central place for infobox data for all Wikipedias.
  3. Creating and updating list articles based on data in Wikidata and linking to other Wikimedia sister projects, including
    Meta-Wiki
    and the own Wikidata (interwikilinks).

Initial rollout

Wikidata was launched on 29 October 2012 and was the first new project of the Wikimedia Foundation since 2006.[4][20][21] At this time, only the centralization of language links was available. This enabled items to be created and filled with basic information: a label – a name or title, aliases – alternative terms for the label, a description, and links to articles about the topic in all the various language editions of Wikipedia (interwikipedia links).

Historically, a Wikipedia article would include a list of interlanguage links (links to articles on the same topic in other editions of Wikipedia, if they existed). Wikidata was originally a self-contained repository of interlanguage links.[22] Wikipedia language editions were still not able to access Wikidata, so they needed to continue to maintain their own lists of interlanguage links.[citation needed]

On 14 January 2013, the

bots. On 23 September 2013, interlanguage links went live on Wikimedia Commons.[29]

Statements and data access

On 4 February 2013, statements were introduced to Wikidata entries. The possible values for properties were initially limited to two data types (items and images on Wikimedia Commons), with more data types (such as coordinates and dates) to follow later. The first new type, string, was deployed on 6 March.[30]

The ability for the various language editions of Wikipedia to access data from Wikidata was rolled out progressively between 27 March and 25 April 2013.[31][32] On 16 September 2015, Wikidata began allowing so-called arbitrary access, or access from a given article of a Wikipedia to the statements on Wikidata items not directly connected to it. For example, it became possible to read data about Germany from the Berlin article, which was not feasible before.[33] On 27 April 2016 arbitrary access was activated on Wikimedia Commons.[34]

According to a 2020 study, a large proportion of the data on Wikidata consists of entries imported en masse from other databases by Internet bots, which helps to "break down the walls" of data silos.[35]

Query service and other improvements

On 7 September 2015, the Wikimedia Foundation announced the release of the Wikidata Query Service,[36] which lets users run queries on the data contained in Wikidata.[37] The service uses SPARQL as the query language. As of November 2018, there are at least 26 different tools that allow querying the data in different ways.[38] It uses Blazegraph as its triplestore and graph database.[39][40]

In 2021

Wikimedia Deutschland released Wikidata:Query Builder
, "a form-based query builder to allow people who don't know how to use SPARQL to" write a query.

The bars on the logo contain the word "WIKI" encoded in Morse code.[41] It was created by Arun Ganesh and selected through community decision.[42]

Reception

In November 2014, Wikidata received the Open Data Publisher Award from the Open Data Institute "for sheer scale, and built-in openness".[43]

In December 2014, Google announced that it would shut down Freebase in favor of Wikidata.[44]

As of November 2018, Wikidata information was used in 58.4% of all English Wikipedia articles, mostly for external identifiers or coordinate locations. In aggregate, data from Wikidata is shown in 64% of all Wikipedias' pages, 93% of all Wikivoyage articles, 34% of all Wikiquotes', 32% of all Wikisources', and 27% of Wikimedia Commons's. Usage in other Wikimedia Foundation projects is a testimonial.[45]

As of December 2020, Wikidata's data was visualized by at least 20 other external tools[46] and over 300 papers have been published about Wikidata.[47]

Wikidata's structured dataset has been used by virtual assistants such as Apple's Siri and Amazon Alexa.[48]

Applications


A systematic literature review of the uses of Wikidata in research was carried in 2019.[55]

See also

References

  1. ^ Error: Unable to display the reference properly. See the documentation for details.
  2. ^ "The Wikidata revolution is here: enabling structured data on Wikipedia". 25 April 2013. Retrieved 12 June 2022. Since Wikidata.org went live on 30 October 2012,
  3. ^ Chalabi, Mona (26 April 2013). "Welcome to Wikidata! Now what?". Retrieved 2 October 2021.
  4. ^ a b Wikidata (Archived 29 October 2012 at the Wayback Machine)
  5. ^ "Data Revolution for Wikipedia". Wikimedia Deutschland. 30 March 2012. Archived from the original on 23 October 2012. Retrieved 11 September 2012.
  6. ^ "Help:Statements – Wikidata". www.wikidata.org.
  7. ^ "Help:Data type – Wikidata". www.wikidata.org.
  8. ^ "Help:Sources – Wikidata". www.wikidata.org.
  9. ^ "Help:Property constraints portal". Wikidata.
  10. ^ Cochrane, Euan (30 September 2016). "Wikidata as a digital preservation knowledgebase". openpreservation.org.
  11. .
  12. ^ "Wikidata:Database reports/List of properties/Top100". Retrieved 9 January 2023.
  13. ^ "Wikidata:Lexicographical data/Documentation – Wikidata". www.wikidata.org.
  14. ^ "Extension:EntitySchema - MediaWiki". mediawiki.org. Retrieved 10 September 2021.
  15. ^ "Initial empty repository". Gerrit. 15 January 2019. Retrieved 12 June 2022.
  16. ^ "Version - Wikidata". Wikidata.org. Retrieved 10 September 2021.
  17. ^ Dickinson, Boonsri (30 March 2012). "Paul Allen Invests In A Massive Project To Make Wikipedia Better". Business Insider. Retrieved 11 September 2012.
  18. ^ Perez, Sarah (30 March 2012). "Wikipedia's Next Big Thing: Wikidata, A Machine-Readable, User-Editable Database Funded By Google, Paul Allen And Others". TechCrunch. Archived from the original on 5 October 2012. Retrieved 11 September 2012.
  19. ^ "Wikidata – Meta". meta.wikimedia.org.
  20. ^ Pintscher, Lydia (30 October 2012). "wikidata.org is live (with some caveats)". wikidata-l (Mailing list). Retrieved 3 November 2012.
  21. ^ Roth, Matthew (30 March 2012). "The Wikipedia data revolution". Wikimedia Foundation. Archived from the original on 11 September 2012. Retrieved 11 September 2012.
  22. .
  23. ^ Pintscher, Lydia (14 January 2013). "First steps of Wikidata in the Hungarian Wikipedia". Wikimedia Deutschland. Retrieved 17 December 2015.
  24. ^ Pintscher, Lydia (30 January 2013). "Wikidata coming to the next two Wikipedias". Wikimedia Deutschland. Retrieved 31 January 2013.
  25. ^ Pintscher, Lydia (13 February 2013). "Wikidata live on the English Wikipedia". Wikimedia Deutschland. Retrieved 15 February 2013.
  26. ^ Pintscher, Lydia (6 March 2013). "Wikidata now live on all Wikipedias". Wikimedia Deutschland. Retrieved 8 March 2013.
  27. ^ "Wikidata ist für alle Wikipedien da" (in German). Golem.de. Retrieved 29 January 2014.
  28. ^ "Wikipedia talk:Wikidata interwiki RFC". 29 March 2013. Retrieved 30 March 2013.
  29. ^ Pintscher, Lydia (23 September 2013). "Wikidata is Here!". Commons:Village pump.
  30. ^ Pintscher, Lydia. "Wikidata/Status updates/2013 03 01". Wikimedia Meta-Wiki. Wikimedia Foundation. Retrieved 3 March 2013.
  31. ^ Pintscher, Lydia (27 March 2013). "You can have all the data!". Wikimedia Deutschland. Retrieved 28 March 2013.
  32. ^ "Wikidata goes live worldwide". The H. 25 April 2013. Archived from the original on 1 January 2014.
  33. ^ Pintscher, Lydia (16 September 2015). "Wikidata: Access to data from arbitrary items is here". Wikipedia:Village pump (technical). Retrieved 30 August 2016.
  34. ^ Pintscher, Lydia (27 April 2016). "Wikidata support: arbitrary access is here". Commons:Village pump. Retrieved 30 August 2016.
  35. Wikidata Q87830400
    .
  36. ^ "Home". query.wikidata.org.
  37. ^ "[Wikidata] Announcing the release of the Wikidata Query Service - Wikidata - lists.wikimedia.org".
  38. ^ "Wikidata:Tools/Query data – Wikidata". www.wikidata.org.
  39. ^ "[Wikidata-tech] Wikidata Query Backend Update (take two!)". lists.wikimedia.org. Retrieved 29 August 2018. (The message also contains a link to the graph databases comparison performed by Wikimedia.)
  40. ^ 86 on GitHub
  41. ^ commons:File talk:Wikidata-logo-en.svg#Hybrid. Retrieved 2016-10-06.
  42. ^ "Und der Gewinner ist..." 13 July 2012.
  43. ^ "First ODI Open Data Awards presented by Sirs Tim Berners-Lee and Nigel Shadbolt". Archived from the original on 24 March 2016.
  44. Google Plus. 16 December 2014. Archived from the original
    on 20 March 2019.
  45. ^ "Percentage of articles making use of data from Wikidata". Archived from the original on 15 November 2018. Retrieved 15 November 2018.
  46. ^ "Wikidata:Tools/Visualize data – Wikidata". www.wikidata.org.
  47. ^ "Scholia". Scholia.
  48. ISSN 1059-1028
    . Retrieved 25 December 2020.
  49. ^ "Rob Barry / Mwnci – Deep Spreadsheets". GitLab.
  50. ^ "Public Review Issues".
  51. ^ "Wiki Explorer in the Google Play Store".
  52. ^ Krause, Volker (12 January 2020), KDE Itinerary – A privacy by design travel assistant, retrieved 10 November 2020
  53. ^ sling on GitHub
  54. ^ Scharpf, P. Schubotz, M. Gipp, B. Mining Mathematical Documents for Question Answering via Unsupervised Formula Labeling ACM/IEEE Joint Conference on Digital Libraries, 2022.
  55. S2CID 202036639
    .

Further reading

External links