Schema-agnostic databases
Schema-agnostic databases or vocabulary-independent databases aim at supporting users to be abstracted from the representation of the data, supporting the automatic semantic matching between queries and databases. Schema-agnosticism is the property of a database of mapping a query issued with the user terminology and structure, automatically mapping it to the dataset vocabulary.
The increase in the size and in the semantic heterogeneity of database schemas bring new requirements for users querying and searching
Description
The evolution of data environments towards the consumption of data from multiple data sources and the growth in the schema size, complexity, dynamicity and decentralisation (SCoDD) of schemas[1][2][3] increases the complexity of contemporary data management. The SCoDD trend emerges as a central data management concern in Big Data scenarios, where users and applications have a demand for more complete data, produced by independent data sources, under different semantic assumptions and contexts of use, which is the typical scenario for Semantic Web Data applications.
The evolution of databases in the direction of heterogeneous data environments strongly impacts the usability, semiotics and semantic assumptions behind existing data accessibility methods such as structured queries, keyword-based search and visual query systems. With schema-less databases containing potentially millions of dynamically changing attributes, it becomes unfeasible for some users to become aware of the 'schema' or vocabulary in order to query the database. At this scale, the effort in understanding the schema in order to build a structured query can become prohibitive.
Schema-agnostic queries
Schema-agnostic queries can be defined as query approaches over structured databases which allow users satisfying complex information needs without the understanding of the representation (schema) of the database. Similarly, Tran et al.[4] defines it as "search approaches, which do not require users to know the schema underlying the data". Approaches such as keyword-based search over databases allow users to query databases without employing structured queries. However, as discussed by Tran et al.: "From these points, users however have to do further navigation and exploration to address complex information needs. Unlike keyword search used on the Web, which focuses on simple needs, the keyword search elaborated here is used to obtain more complex results. Instead of a single set of resources, the goal is to compute complex sets of resources and their relations."
The development of approaches to support
Schema-agnostic structured queries
Consist of schema-agnostic queries following the syntax of a structured standard (for example SQL, SPARQL). The syntax and semantics of operators are maintained, while different terminologies are used.
Example 1
SELECT ?y { BillClinton hasDaughter ?x . ?x marriedTo ?y . }
which maps to the following SPARQL query in the dataset vocabulary:
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?y {
:Bill_Clinton dbpedia:child ?x .
?x dbpedia2:spouse ?y .
}
Example 2
SELECT ?x {
?x isA book .
?x by William_Goldman .
?x has_pages ?p .
FILTER (?p > 300)
}
which maps to the following SPARQL query in the dataset vocabulary:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
SELECT ?x {
?x rdf:type dbpedia:Book .
?x dbpedia2:author :William_Goldman .
?x dbpedia:numberOfPages ?p .
FILTER(?p > 300)
}
Schema-agnostic keyword queries
Consist of schema-agnostic queries using keyword queries. In this case the syntax and semantics of operators are different from the structured query syntax.
Example
"Bill Clinton daughter married to"
"Books by William Goldman with more than 300 pages"
Semantic complexity
As of 2016 the concept of schema-agnostic queries has been developed primarily in academia. Most of schema-agnostic query systems have been investigated in the context of
References
- ^ a b c A. Freitas, "Schema-agnostic queries over large-schema databases: a distributional semantics approach" PhD Thesis, 2015
- ^ Pat Helland, ["If you have too much data, then 'good enough' is good enough"], Commun. ACM 54(6): 40–47, 2011.
- ^ M. L. Brodie and J. T. Liu, ["The power and limits of relational technology in the age of information ecosystems"], Keynote, On The Move Federated Conferences, Heraklion, Greece, October 25–29, 2010.
- ^ T. Tran, T. Mathaess, P. Haase, ["Usability of Keyword-driven Schema-agnostic Search – A Comparative Study of Keyword Search, Faceted Search, Query Completion and Result Completion"], In Proceedings of 7th Extended Semantic Web Conference (ESWC'10). Heraklion, Greece, June, 2010.
- ^ a b A. Freitas, J. C. Pereira Da Silva, E. Curry, "On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study", Workshop of the Natural Language Interfaces for the Web of Data (NLIWoD), 13th International Semantic Web Conference (ISWC), Rival del Garda, 2014.
- ^ a b S. Bischof, M. Kroetzsch, A. Polleres, S. Rudolph, ["Schema-Agnostic Query Rewriting in SPARQL 1.1"], In Proceedings of the 13th International Semantic Web Conference. Springer 2014.
- ^ Unger et al., ["Introduction to Question Answering over Linked Data"], In Proceedings of the 2014 Reasoning Web Summer School, 2014
- ^ A. Freitas, J. E. Sales, S. Handschuh, E. Curry, "How hard is the Query? Measuring the Semantic Complexity of Schema-Agnostic Queries", In Proceedings of the 11th International Conference on Computational Semantics (IWCS), London, 2015.