-
Notifications
You must be signed in to change notification settings - Fork 0
/
introduction.qmd
318 lines (232 loc) · 6.21 KB
/
introduction.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
---
title: Introduction to Labeled Property Graphs
subtitle: Introduction
author: Jakob Voß
date: 2024-11-22
---
# Data Modelling with Graphs (and Star Wars)
## "Little boxes with arrows and stuff"
![](Netwerkje.jpg)
:::{.notes}
Ken Thompson (inventor of Unix, and UTF-8) about how he begins a program with scribbling data structures
:::
## Example
```{dot}
graph {
rankdir=LR;
C3PO [fontcolor=white];
R2D2 [fontcolor=white];
Luke [fontcolor=white];
R2D2 -- C3PO [label="friends", fontcolor=white];
R2D2 -- Luke [label="⮜ owns", fontcolor=white];
Luke -- C3PO [label="owns ➤", fontcolor=white];
}
```
## Example
```{dot}
graph {
rankdir=LR;
R2D2 -- C3PO [label="friends"];
R2D2 -- Luke [label="⮜ owns", fontcolor=black];
Luke -- C3PO [label="owns ➤", fontcolor=black];
}
```
## Example
```{dot}
graph {
rankdir=LR;
C3PO [label=Robot];
R2D2 [label=Robot];
Luke [label=Person];
R2D2 -- C3PO [label="friends", fontcolor=white];
R2D2 -- Luke [label="⮜ owns", fontcolor=white];
Luke -- C3PO [label="owns ➤", fontcolor=white];
}
```
## Basic graph elements for data modeling
- **nodes** (aka vertices) representing entities
- **edges** (aka connections, relations...)
- node labels as
- **node identifiers** and/or
- **node types** (aka **node labels**, classes...)
- edge labels as **edge types** (aka **edge labels**...)
## Data modeling
![](levels-of-data-modeling.svg)
. . .
- Arbitrary graphs used for models
- Models expressed in data formats
# Some Graph Data Formats
## RDF/Turtle
~~~ttl
# directed edges
<Luke> <owns> <R2D2> ,
<C3PO> .
# node types (additional edges)
<R2D2> a <robot> .
<C3PO> a <robot> .
<Luke> a <person> .
# no undirected edges!
<R2D2> <friend> <C3PO> .
~~~
. . .
- Requires IRIs
- More limitations later
## CSV
robot ownership
~~~csv
owner,robot
Luke,C3PO
Luke,R2D2
~~~
robot friendship
~~~csv
friend1,friend2
R2D2,C3PO
~~~
. . .
- Requires contextual information
- Least common denominator: resistance is futile!
## SQL
~~~sql
# nodes
INSERT INTO robots VALUES ("R2D2");
INSERT INTO robots VALUES ("C3PO");
INSERT INTO people VALUES ("Luke");
# edges
INSERT INTO robot_ownership VALUES ("Luke", "C3PO");
INSERT INTO robot_ownership VALUES ("Luke", "R2D2");
INSERT INTO robot_friends VALUES ("R2D2", "C3PO"); # directed!
~~~
. . .
- Requires a database schema. Pros and cons?
## GraphML
~~~xml
<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns">
<graph edgedefault="undirected">
<node id="C3PO"/>
<node id="Luke"/>
<node id="R2D2"/>
<edge source="Luke" target="C3PO" directed="true"/>
<edge source="Luke" target="R2D2" directed="true"/>
<edge source="C3PO" target="R2D2"/>
</graph>
</graphml>
~~~
. . .
- Many more graph data formats exist
- Mostly for applications other than metadata\
(e.g. network analysis)
## Cypher
~~~cypher
CREATE (C3PO:robot)
CREATE (Luke:person)
CREATE (R2D2:robot)
CREATE (Luke)-[:owns]->(C3PO)
CREATE (Luke)-[:owns]->(R2D2)
CREATE (Luke)-[:friend]->(C3PO) # directed!
~~~
. . .
- Used in property graph databases
- Established standard (more or less)
## Property Graph Exchange Format (PG)
~~~pg
# nodes
R2D2 :robot
C3PO :robot
Luke :person
# edges
Luke -> C3PO :owns # directed
Luke -> R2D2 :owns # undirected
C3PO -- R2D2 :friends # undirected
~~~
. . .
- Specified, documented, implemeted, but less known
- <https://pg-format.github.io/>
- Command line tool `pgraph` to convert formats
## Try out PG in your browser!
- <https://pg-format.github.io/pg-formatter/>
- Create a graph of some robots in Star Wars!
# Additional features
## Additional graph features
```{dot}
digraph {
rankdir=LR;
X -> X [label="loops"];
subgraph cluster {
style=filled;
color=lightgrey;
node [style=filled,color=white];
a -> b;
label = "subgraph";
}
X -> a;
X -> Y [label="multi"];
X -> Y [label="edges"];
orphan;
}
```
. . .
- Support depends on the actual format or software!
## Properties / Attributes
~~~pg
# node properties
Padmé :person gender: female
Anakin :person gender: male
Luke :person gender: male
C3PO :robot color: golden, silver # multi-value!
R2D2 :robot
# edge properties
Padmé -> R2D2 :owns episode:1
Anakin -> R2D2 :owns episode:2
Anakin -> Luke :child episode:3
Padmé -> Luke :child episode:3
Luke -> R2D2 :owns episode:4
Luke -> C3PO :owns episode:4
~~~
## Details depend on format & software
- Special properties (name, id, visual, reserved...)
- Which datatypes are supported (string, number, date...)?
- Can properties have values of mixed type? Empty set? Null?
- What are node/edge ids (internal, numeric, name...)?
- Can nodes/edges have multiple labels/types?
## Wikidata as a property graph
![](marcia-lucas-wikidata-1.png)
![](marcia-lucas-wikidata.png)
## Wikidata as a property graph
```{mermaid}
flowchart LR
Q28193["<u>Academy Award for Best Film Editing (Q28193)</u><br>alias: Oscar for Best Film Editing"]
Q463119["<u>Marcia Lucas (Q463119)</u><br><tt>alias:</tt> Marcia Griffin"]
Q463119 -- "<u>award received (P166)</u><br>for work: Star Wars<br>date: 1978" --> Q28193
```
. . .
- Node identifiers and edge labels (property identifiers)
- Data model and terminology differ from\
both RDF and common property graph models
- aliases and descriptions with language
- properties can link to entities
# First Summary
## (Labeled) Property Graphs
- A class of graph structures where
- nodes and edges have **labels** (aka types)
- nodes and edges have **properties** (aka attributes)
. . .
- Specific features differ depending on data format and software
- Useful for data modeling and schema-less data management
## Tow hard things
> There are only two hard things in Computer Science: cache invalidation and naming things. --- Phil Karlton
- "Property"
- attribute-value pair in a property graph
- IRI used as middle part in an RDF triple
- attribute, field, ...
- "Label"
- type, class, ...
- name, ...
## Some property Graph data formats
- CSV
- Cypher
- GraphML
- Property Graph Exchange Format (PG)
- ...
Converter NPM package [pgraphs](https://www.npmjs.com/package/pgraphs)