File:Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia - Presentation and article - Digital Cultural Heritage, Berlin, 30-08-2017.pdf

Page contents not supported in other languages.
This is a file from the Wikimedia Commons
Source: Wikipedia, the free encyclopedia.
Go to page
next page →
next page →
next page →

Original file(1,500 × 1,125 pixels, file size: 7.4 MB, MIME type: application/pdf, 59 pages)

Summary

Description

Summary

  • During the second World War some 1.300 illegal newspapers were issued by the Dutch resistance.
  • Right after the war as many of these newspapers as possible were physically preserved by Dutch memory institutions. They were described in formal library catalogues, that were digitized and brought online in the ‘90s. In 2010 the national collection of underground newspapers – some 200.000 pages – was full-text digitized in Delpher, the national aggregator for historical full-texts.
  • Having created online metadata and full-texts for these publications, the third pillar context was still missing, making it hard for people to understand the historic background of the newspapers.
  • We are currently running a project to tackle this contextual problem. We started by extracting contextual entries from a hard-copy standard work on Dutch illegal press and combined these with data from the library catalogue and Delpher into a central LOD triple store.
  • We then created links between historically related newspapers and used Named Entity Recognition to find persons, organisations and places related to the newspapers. We further semantically enriched the data using DBPedia.
  • Next, using an article template to ensure uniformity and consistency, we generated 1.300 Wikipedia article stubs from the database.
  • Finally, we sought collaboration with the Dutch Wikipedia volunteer community to extend these stubs into full encyclopedic articles.
  • In this way we can give every newspaper its own Wikipedia article, making these WW2 materials much more visible to the Dutch public, over 80% of whom uses Wikipedia.
  • At the same time the triple store can serve as a source for alternative applications, like data visualizations. This will enable us to visualize connections and networks between underground newspapers, as they developed over time between 1940 and 1945.

Abstract of associated article

(see Zenodo for separate PDF of abstract)

Using Linked Open Data to crowdsource Dutch WW2 underground newspapers on Wikipedia
Olaf Janssen
Koninklijke Bibliotheek, national library of the Netherlands / Wikimedia Netherlands
Prins Willem-Alexanderhof 5, 2595BE The Hague, The Netherlands
[email protected]

Keywords: Linked Open Data, Data Reuse, Semantical Web, Wikipedia, DBpedia, Crowdsourcing, Community Involvement, World War 2, Illegal Press, Historical Newspapers.

Abstract

During the second World War hundreds of illegal newspaper titles were issued by the Dutch resistance. In the years after the war the Dutch Institute for War and Holocaust Studies (NIOD) in Amsterdam managed to collect and physically preserve some 1.300 titles from all over the country.

In the 1950s the NIOD composed De Ondergrondse Pers 1940-1945, a hard copy monograph containing contextual entries for all 1.300 newspapers [01]. This is considered to be the standard work on Dutch illegal press. The newspapers were also described in formal library card catalogues, which were digitized and made available online in the 1990s. In 2010 this entire national collection of underground newspapers – some 200.000 pages – was digitized into searchable full text and made accessible via Delpher, the national aggregator for historical full texts [02].

Having created online metadata (library catalogue) and full-text content (Delpher) the third pillar online context was still missing for these publications, as this is not included in the first two pillars. The contextual information was still locked up offline in the hard copy monograph. This made it harder than necessary for people to discover and understand the historical and cultural backgrounds of the Dutch illegal press. An additional problem was data fragmentation: there were no links between the catalogue, Delpher and the monograph.

To tackle these contextual and fragmentation problems, early 2015 the national library of the Netherlands (KB), the NIOD and Wikimedia Netherlands [03] joined forces. The common goal of these public organisations is to make cultural heritage accessible and reusable for as many people as possible. This is why they decided to give Wikipedia a central role in the project; 80% of the Dutch population use the encyclopedia [04]. The project’s central aim was to systematically and uniformly describe all 1.300 Dutch underground newspapers from World War 2 on Dutch Wikipedia.

The first step was to digitize the hard copy of De Ondergrondse Pers into a full text PDF file and make it available under an open license (CC-BY-SA). This ensured its contents could be freely reused [05]. Next we extracted the contextual entries from the PDF and combined these with data from the library catalogue and Delpher into a central underground newspaper database. Because we wanted to aim for maximum openness and reusability, we chose a Linked Open Data approach, with a Virtuoso database for storing the RDF triples based on the BibFrame data model [06].

We then created links between historically related newspapers and used Named Entity Recognition (using gazetteers and the SILK Workbench) to identify persons, organisations and places related to the newspapers. We further semantically enriched the data using DBpedia.

Having created the semantically enriched LOD database, the next step was to build a Wikipedia article template. This enabled us to automatically generate 1.300 uniform Wikipedia article stubs from the database [07]. These stubs not only contain the contextual information from De Ondergrondse Pers, but also links to the library catalogue and Delpher. In this way metadata, full text content and context have been brought together for the first time, making it easier for people to interpret the Dutch underground press.

Finally, in the summer of 2016 we sought collaboration with the Dutch Wikipedia volunteer community to extend these stubs into full encyclopedic articles. After three months, a small group of volunteers has created around 100 new articles [08]. In this way, over the next few years we hope to give every illegal newspaper its own Wikipedia article, making these WW2 materials much more discoverable for the Dutch public.

We also expect the LOD triple store to be used as a source for alternative (non-Wikipedia) applications, like data visualizations. This will enable the visualization of connections and networks between underground newspapers, as they developed over space and time between 1940 and 1945.

References

Conference context

Presentation during and article for the DCH (Digital Cultural Heritage) 2017 conference, 30th Aug - 1st Sept 2017, Staatsbibliothek Berlin, Germany (archive)
Date
Source Own work
Author
Olaf Janssen    wikidata:Q66439268
 
Olaf Janssen
Alternative names
Olaf D. Janssen; Olaf Daniel Janssen; O. D. Janssen
Description Dutch librarian
Date of birth 20th century
date QS:P,+1950-00-00T00:00:00Z/7
 Edit this at Wikidata
Location of birth Dongen
Work location
The Hague (2001–) Edit this at Wikidata
Authority file
creator QS:P170,Q66439268
Other versions https://zenodo.org/records/13125086 en https://doi.org/10.5281/zenodo.13125086

Licensing

I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

Captions

Presentation and article about a project to describe and interlink all 1.300 Dutch resistance newspapers from WW2 on Wikipedia using linked data and crowdsourcing.

Items portrayed in this file

depicts

30 August 2017

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current16:05, 29 July 2024Thumbnail for version as of 16:05, 29 July 20241,500 × 1,125, 59 pages (7.4 MB)OlafJanssenc:User:Rillke/bigChunkedUpload.js:
No pages on the English Wikipedia use this file (pages on other projects are not listed).

Metadata