User:Steyncham/sandbox
The data model of "property graphs" or "attributed graphs " has emerged since the early 2000s as a common denominator of various models of graph-oriented databases[1]. It can be defined informally as follows:
- In computer science terms, a property graph is a data structure representing entities associated by directed relationships, where the nodes and relations can both include multiple attributes / properties
- In terms of graph theory, a property graph is a directed multigraph , whose vertices/nodes represent the entities of the corresponding data structure
- Properties take the form of key-value pairs, as used for example in JSON. Keys are defined by character strings. Values are either numeric or also character strings. These properties fall within the usual definition of attributes as understood in entity-attribute-value or object-oriented modeling. This is why the phrase "attributed graph" is relevant. Unlike what is the case with RDFgraphs, properties are not arcs of the graph proper. This is another reason why it would be preferable to call them attributed graphs, or graphs with properties, rather than "property graphs", which is misleading.
- Relationships are represented by arcs of the graph. These are often called edges, even though, strictly speaking, edges belong in undirected graphs. Arcs must have an identifier, a source node and a target node, and may have one or more attributes/properties in the previous sense
Formal definition
Building upon widely adopted definitions[2][3], a property graph/attributed graph can be defined by a 7-tuple (N , A, P, V, α, , π), where
- N is the set of nodes /verticesof the graph
- A is the set of arcs (directed edges) of the graph
- K is a set of keys, taken from a countable set, defining the nature of attributes/properties
- V is a set of values, to be associated with these keys in order to define full-fledged attributes
- is a total function, defining the multigraph proper. For a ∈ A, u∈ N, v ∈ N, α (a) = (u, v) means that a is an arc of the graph having node u for origin and node v for target
- is a binary relation over (A∪N) and K (formally defined as a subset of the cartesian product (A∪N)×K ), associating zero, one or several keys to each arc and node of the graph
- is a partial function, providing values for the properties of the nodes and the arcs which include them. For u ∈ N, a ∈ A and k ∈ K, π (u, k) (respectively π (a, k)) is the value associated with the property key k for the node u, (respectively the arc a), if the corresponding attribute property is defined there.
A complementary construct, used in several implementations of property graphs with commercial graph databases, is that of labels, which can be associated both with nodes and arcs of the graph. Labels have a practical rather than theoretical justification, as they were originally intended for users of
Relations with other models
Graph theory and classical graph algorithmics
Attributed graphs, as defined above, are especially useful and relevant in that they provide an "umbrella"
- Labeled graphs associate labels to each vertex and/or edge of a graph. Matched with attributed graphs, these labels would correspond to attributes comprising only a key, taken from a countable set (typically a character string, or an integer)
- Colored graphs, as used in classical graph coloring problems, are but special cases of labeled graphs, whose labels are defined on a finite set of keys, matched to colors.
- Weighted graphs associate a numerical value to arcs/edges, and, when relevant, to the vertices of a directed or undirected graph. These weights/valuations would correspond to the differents values of a set of attributes with the same key. As an example, for a graph modeling a road network, we could have a set of weights corresponding to the capacities (measured in number of vehicles per unit of time), and another representing the distances, these two valuations being associated with each road segment represented by an edge of the graph, and differentiated by two corresponding keys.
- Flow networks are weighted graphs whose weights are interpreted as a capacities. They are used in all kinds of very classical models of transport networks, used e.g. with maximum flow algorithms.
- Shortest path problems, as solved by very classical algorithms (like Dijkstra's algorithm), operate on weighted graphs for which the weights correspond to distances, real or virtual.
Knowledge graphs and RDF graphs
Standardization
NGSI-LD
The NGSI-LD data model specified by ETSI has been the first attempt to standardize property graphs under a de jure standards body. Compared to the basic model defined here, the NGSI-LD meta-model adds a formal definition of basic categories (entity, relation, property) on the basis of
GQL
The
Type graphs and schemas
Graph-oriented databases are, compared to
The notions of "type graphs" and schemas[2] make it possible to meet this need, with types playing a role similar to that of labels in classical graph databases, but with the added possibility of specifying relations between these types and constraining them by keys and properties. The type graph is itself a property graph, linked by a relation of graph homomorphism with the graphs of instances that use the types it defines, playing a role similar to that of a schema in a data definition language.
The
References
- ^ Angles, Renzo (2012-04-01). "A comparison of current graph database models". International Conference on Data Engineering. IEEE.
- ^ ISBN 978-3-030-33222-8, retrieved 2021-09-15
- ISBN 978-3-319-63962-8, retrieved 2021-09-15
- ^ Privat, Gilles; Abbas, Abdullah “Cyber-Physical Graphs” vs. RDF graphs, W3C Workshop on Web Standardization for Graph Data, Berlin, March 2019