Croatian Language Corpus
The Croatian Language Corpus (CLC) (
Background
The CLC was initially funded as a sub-project of the research program Riznica (Croatian Language Repository) by the
Goals
One of the main goals of the CLC project is to create a publicly available
Format and Availability
From the outset, the collected and digitized texts in the CLC were annotated using the Text Encoding Initiative (TEI) P5 XML standard. Currently approx. 90 mil. tokens are available in the TEI P5 XML format. The corpus can be accessed online via the Philologic[2] interface (see The ARTFL Project,[3] Department of Romance Languages and Literatures, The University of Chicago). It is virtualized into various sub-corpora, and individual or specific definitions of sub-corpora can be provided on demand.
Content
The CLC is assembled from selected text of Croatian, covering various functional domains and genres. It includes literature and other written sources from the period of the beginning of the final shaping of the standardization of Croatian, i.e. from the second half of the 19th century on.
The CLC consists of:
- fundamental Croatian literature (e.g. novels, short stories, drama, poetry)
- non-fiction
- scientific publications from various domains and University textbooks
- school books
- translated literature from outstanding Croatian translators
- online journals and newspapers
- books from the pre-standardization period of Croatian that are adapted to nowadays standard Croatian
Cooperation
The realization of the CLC was made possible in cooperation with:
- Školska knjiga d.d.
- Croatian Academy of Sciences and Arts (HAZU)
- Stoljeća hrvatske književnosti, Matica hrvatska
References
- ^ Ćavar and Brozović Rončević, 2012
- ^ Philologic
- ^ "The ARTFL Project". Archived from the original on 2009-12-04. Retrieved 2011-05-22.
External links
- Croatian Language Corpus (CLC) website and Philologic interface
- (in Croatian) Croatian National Corpus, another Croatian corpus by the Institute of Linguistics of the Faculty of Humanities and Social Sciences, University of Zagreb