A collection of resources on the topic of Complex Logical Query Answering accompanying the paper Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases. Feel free to open PRs and issues to add new papers, datasets, and implementations!
This repo follows the Neural Query Engine taxonomy proposed in the paper (Figure 9).
Triple-based KGs (44)
- GQE, NeurIPS 2018
- GQE+hashing, ICDM 2019
- CGA, K-CAP 2019
- TractOR, UAI 2020
- Query2Box, ICLR 2020
- BetaE, NeurIPS 2020
- EmQL, NeurIPS 2020
- MPQE, ICML 2020 Workshop
- RotatE-Box, AKBC 2021
- BiQE, AAAI 2021
- Knowledge Sheaves: A Sheaf-Theoretic Framework for Knowledge Graph Embedding, arxiv 2021
- CQD, ICLR 2021
- HypE, WWW 2021
- NewLook, KDD 2021
- ConE, NeurIPS 2021
- PERM, NeurIPS 2021
- Neural-symbolic Approach for Ontology-mediated Query Answering, arxiv 2021
- LogicE, arxiv 2021
- MLPMix, ICLR 2022
- FuzzQE, AAAI 2022
- GNN-QE, ICML 2022
- SMORE, KDD 2022
- kgTransformer, KDD 2022
- LinE, KDD 2022
- Query2Particles, NAACL 2022
- TAR, arxiv 2022
- TeMP, arxiv 2022
- FLEX, arxiv 2022
- TFLEX, arxiv 2022
- GNNQ, ISWC 2022
- ENeSy, NeurIPS 2022
- NodePiece-QE, NeurIPS 2022
- GammaE, EMNLP 2022
- NMP-QEM, EMNLP 2022
- SignalE, KSEM 2022
- Query2Geom, AICS 2022
- LMPNN, ICLR 2023
- QTO, arxiv 2023
- Var2Vec, AAAI 2023
- CQD A, arxiv 2023
- SQE, arxiv 2023
- NRN KDD 2023
- FIT, arxiv 2023
- WFRE, ACL 2023
Hyper-graphs and Multi-modal graphs (0)
- None as of March 2023
Discrete (Entities only) (45)
- GQE, NeurIPS 2018
- GQE+hashing, ICDM 2019
- CGA, K-CAP 2019
- TractOR, UAI 2020
- Query2Box, ICLR 2020
- BetaE, NeurIPS 2020
- EmQL, NeurIPS 2020
- MPQE, ICML 2020 Workshop
- RotatE-Box, AKBC 2021
- BiQE, AAAI 2021
- Knowledge Sheaves: A Sheaf-Theoretic Framework for Knowledge Graph Embedding, arxiv 2021
- CQD, ICLR 2021
- HypE, WWW 2021
- NewLook, KDD 2021
- ConE, NeurIPS 2021
- PERM, NeurIPS 2021
- Neural-symbolic Approach for Ontology-mediated Query Answering, arxiv 2021
- LogicE, arxiv 2021
- MLPMix, ICLR 2022
- StarQE, ICLR 2022
- FuzzQE, AAAI 2022
- GNN-QE, ICML 2022
- CBR-SUBG, ICML 2022
- SMORE, KDD 2022
- kgTransformer, KDD 2022
- LinE, KDD 2022
- Query2Particles, NAACL 2022
- TAR, arxiv 2022
- TeMP, arxiv 2022
- FLEX, arxiv 2022
- GNNQ, ISWC 2022
- ENeSy, NeurIPS 2022
- NodePiece-QE, NeurIPS 2022
- GammaE, EMNLP 2022
- NMP-QEM, EMNLP 2022
- SignalE, KSEM 2022
- Query2Geom, AICS 2022
- LMPNN, ICLR 2023
- QTO, arxiv 2023
- Var2Vec, AAAI 2023
- NQE, AAAI 2023
- CQD A, arxiv 2023
- SQE, TMLR 2023
- FIT, arxiv 2023
- WFRE, ACL 2023
Discrete Temporal (Entities + Dates) (1)
Discrete + Continuous (Entities + string/numerical Literals) (0)
- None as of March 2023
Facts-only (ABOX) (42)
- GQE, NeurIPS 2018
- GQE+hashing, ICDM 2019
- TractOR, UAI 2020
- Query2Box, ICLR 2020
- BetaE, NeurIPS 2020
- EmQL, NeurIPS 2020
- MPQE, ICML 2020 Workshop
- RotatE-Box, AKBC 2021
- BiQE, AAAI 2021
- Knowledge Sheaves: A Sheaf-Theoretic Framework for Knowledge Graph Embedding, arxiv 2021
- CQD, ICLR 2021
- HypE, WWW 2021
- NewLook, KDD 2021
- ConE, NeurIPS 2021
- PERM, NeurIPS 2021
- LogicE, arxiv 2021
- MLPMix, ICLR 2022
- FuzzQE, AAAI 2022
- GNN-QE, ICML 2022
- CBR-SUBG, ICML 2022
- SMORE, KDD 2022
- kgTransformer, KDD 2022
- LinE, KDD 2022
- Query2Particles, NAACL 2022
- FLEX, arxiv 2022
- TFLEX, arxiv 2022
- GNNQ, ISWC 2022
- ENesy, NeurIPS 2022
- NodePiece-QE, NeurIPS 2022
- GammaE, EMNLP 2022
- NMP-QEM, EMNLP 2022
- SignalE, KSEM 2022
- Query2Geom, AICS 2022
- LMPNN, ICLR 2023
- QTO, arxiv 2023
- Var2Vec, AAAI 2023
- NQE, AAAI 2023
- CQD A, arxiv 2023
- SQE, TMLR 2023
- NRN KDD 2023
- FIT, arxiv 2023
- WFRE, ACL 2023
Complex axioms (TBOX) (1)
Shallow Embedding (32)
- GQE, NeurIPS 2018
- GQE+hashing, ICDM 2019
- CGA, K-CAP 2019
- TractOR, UAI 2020
- Query2Box, ICLR 2020
- BetaE, NeurIPS 2020
- EmQL, NeurIPS 2020
- Knowledge Sheaves: A Sheaf-Theoretic Framework for Knowledge Graph Embedding, arxiv 2021
- RotatE-Box, AKBC 2021
- Neural-symbolic Approach for Ontology-mediated Query Answering, arxiv 2021
- HypE, WWW 2021
- NewLook, KDD 2021
- CQD, ICLR 2021
- ConE, NeurIPS 2021
- PERM, NeurIPS 2021
- LogicE, arxiv 2021
- FuzzQE, AAAI 2022
- SMORE, KDD 2022
- LinE, KDD 2022
- TAR, arxiv 2022
- Query2Particles, NAACL 2022
- FLEX, arxiv 2022
- TFLEX, arxiv 2022
- GammaE, EMNLP 2022
- NMP-QEM, EMNLP 2022
- SignalE, KSEM 2022
- Query2Geom, AICS 2022
- QTO, arxiv 2023
- Var2Vec, AAAI 2023
- CQD A, arxiv 2023
- FIT, arxiv 2023
- WFRE, ACL 2023
Transductive Encoder (9)
Inductive Encoder (4)
- (TeMP) Type-aware embeddings for multi-hop reasoning over knowledge graphs, arxiv 2022
- (GNN-QE) Neural-Symbolic Models for Logical Queries on Knowledge Graphs, ICML 2022
- (GNNQ) GNNQ: A Neuro-Symbolic Approach for Query Answering over Incomplete Knowledge Graphs, ISWC 2022
- (NodePiece-QE) Inductive Logical Query Answering in Knowledge Graphs, NeurIPS 2022
Any Processor (2)
- TeMP, arxiv 2022
- NodePiece-QE, NeurIPS 2022
End-to-end Neural (15)
- GQE, NeurIPS 2018
- GQE+hashing, ICDM 2019
- CGA, K-CAP 2019
- MPQE, ICML 2020 Workshop
- BiQE, AAAI 2021
- MLPMix, ICLR 2022
- StarQE, ICLR 2022
- kgTransformer, KDD 2022
- Query2Particles, NAACL 2022
- SMORE, KDD 2022
- GNNQ, ISWC 2022
- SignalE, KSEM 2022
- LMPNN ICLR 2023
- SQE, TMLR 2023
- WFRE, ACL 2023
Neuro-Symbolic | Geometric (8)
- Query2Box, ICLR 2020
- Neural-symbolic Approach for Ontology-mediated Query Answering, arxiv 2021
- Knowledge Sheaves: A Sheaf-Theoretic Framework for Knowledge Graph Embedding, arxiv 2021
- RotatE-Box, AKBC 2021
- NewLook, KDD 2021
- HypE, WWW 2021
- ConE, NeurIPS 2021
- Query2Geom, AICS 2022
Neuro-Symbolic | Probabilistic (5)
Neuro-Symbolic | Fuzzy Logic (16)
Non-Parametric (all)
All existing models up to March 2023
Parametric (0)
- None as of March 2023
Progressive scale of supported operators. That is, all models listed under the "NOT" category also support JOIN and UNION.
PROJECTION + JOIN (intersection) (10)
- GQE, NeurIPS 2018
- GQE+hashing, ICDM 2019
- CGA, K-CAP 2019
- TractOR, UAI 2020
- MPQE, ICML 2020 Workshop
- Knowledge Sheaves: A Sheaf-Theoretic Framework for Knowledge Graph Embedding, arxiv 2021
- BiQE, AAAI 2021
- StarQE, ICLR 2022
- SMORE, KDD 2022
- GNNQ, ISWC 2022
+ UNION (9)
- Query2Box, ICLR 2020
- EmQL, NeurIPS 2020
- HypE, WWW 2021
- NewLook, KDD 2021
- PERM, NeurIPS 2021
- CQD ICLR’21
- Neural-symbolic Approach for Ontology-mediated Query Answering, arxiv 2021
- kgTransformer, KDD 2022
- Query2Geom, AICS 2022
+ NOT (negation) (23)
- BetaE, NeurIPS 2020
- ConE, NeurIPS 2021
- LogicE, arxiv 2021
- MLPMix, ICLR 2022
- FuzzQE, AAAI 2022
- GNN-QE, ICML 2022
- LinE, KDD 2022
- Query2Particles, NAACL 2022
- GammaE, EMNLP 2022
- NMP-QEM, EMNLP 2022
- TAR, arxiv 2022
- FLEX, arxiv 2022
- TFLEX, arxiv 2022
- ENeSy, NeurIPS 2022
- SignalE, KSEM 2022
- QTO, arxiv 2023
- LMPNN, ICLR 2023
- NQE, AAAI 2023
- Var2Vec, AAAI 2023
- CQD A, arxiv 2023
- SQE, TMLR 2023
- FIT, arxiv 2023
- WFRE, ACL 2023
Kleene Plus (1)
- RotatE-Box, AKBC 2021
FILTER (0)
- None as of March 2023
AGGREGATIONS (GROUP BY, ORDER BY, etc) (0)
- None as of March 2023
Tree-structured (47)
- All existing processors as of March 2023
Arbitrary DAGs (1)
- FIT, arxiv 2023
Cyclic Queries (1)
- FIT, arxiv 2023
Zero Projected Vars (ASK queries) (0)
- None as of March 2023
One Projected Variable (all)
- All processors as of March 2023
Multiple Projected Variables (1)
- (EFOk-CQA) EFOk-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation arxiv 2023
- Original metrics: ROC AUC and Average Percentile Rank over 1000 negative samples.
- Proposed by original GQE (NeurIPS 2018), used by GQE+hashing, CGA, and TractOR. Not used after.
- Generalization: predicting hard answers (MRR / Hits@k).
- Introduced by Query2Box (ICLR 2020). Standard metric.
- Generalization: from ranking to binary classification
- Entailment: faithfulness - ability to recover easy answers (no link prediction) (MRR / Hits@k)
- Proposed by EmQL (NeurIPS 2020)
- Estimating the cardinality of answer set size (Spearman's rank correlation, MAPE)
- Predicting easy answers before hard answers (ROC-AUC)
- Used in NodePiece-QE
- Multiple variable queries, (multiply / marginal / joint) x (MRR/ HITs@k)
- Proposed in EFOk-CQA arxiv 2023
Transductive datasets (15)
- (GQE datasets) GQE, NeurIPS 2018
- (Query2Box datasets) Query2Box, ICLR 2020
- (BetaE datasets) BetaE, NeurIPS 2020
- (Regex datasets) Regex Queries, AKBC 2021
- (BiQE dataset) BiQE, AAAI 2021
- (Query2Onto datasets) Neural-symbolic Approach for Ontology-mediated Query Answering, arxiv 2021
- (EFO-1 dataset) Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs, NeurIPS 2021 (Datasets and Benchmarks)
- (SMORE datasets) SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs, KDD 2022
- (StarQE dataset) Query Embedding on Hyper-relational Knowledge Graphs ICLR 2022,
- (TFLEX dataset) TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph, arxiv 2022
- (WD50K-NFOL dataset) NQE: N-ary Query Embedding for Complex Query Answering over Hyper-relational Knowledge Graphs, AAAI 2023
- (SQE dataset) Sequential Query Encoding For Complex Query Answering on Knowledge Graphs
- (Real EFO-1) On Existential First Order Queries Inference on Knowledge Graphs, arxiv 2023
- (Numerical CQA dataset) Knowledge Graph Reasoning over Entities and Numerical Values KDD 2023
- (EFOk-CQA) EFOk-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation arxiv 2023
Inductive datasets (3)
- (TeMP datasets) Type-aware embeddings for multi-hop reasoning over knowledge graphs, arxiv 2022
- (InductiveQE datasets) Inductive Logical Query Answering in Knowledge Graphs NeurIPS 2022
- (GNNQ dataset) GNNQ: A Neuro-Symbolic Approach for Query Answering over Incomplete Knowledge Graphs ISWC 2022
Are Bio and Reddit available at all? Introduced in GQE, used in 4 papers overall (GQE, GQE+hashing, CGA, TractOR).
The main difference with Query2Box datasets: queries in the BetaE datasets have less than 100 answers. Has queries with negation.
Introduced in Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs, NeurIPS 2020
Graphs
Dataset | Entities | Relations | Training Edges | Validation Edges | Test Edges | Total Edges |
---|---|---|---|---|---|---|
FB15k | 14,951 | 1,345 | 483,142 | 50,000 | 59,071 | 592,213 |
FB15k237 | 14,505 | 237 | 272,115 | 17,526 | 20,438 | 310,079 |
NELL995 | 63,361 | 200 | 114,213 | 14,324 | 14,267 | 142,804 |
Queries
Queries | Training | Training | Validation | Validation | Test | Test |
---|---|---|---|---|---|---|
Dataset | 1p/2p/3p/2i/3i | 2in/3in/inp/pin/pni | 1p | others | 1p | others |
FB15k | 273,710 | 27,371 | 59,097 | 8,000 | 67,016 | 8,000 |
FB15k237 | 149,689 | 14,968 | 20,101 | 5,000 | 22,812 | 5,000 |
NELL995 | 107,982 | 10,798 | 16,927 | 4,000 | 17,034 | 4,000 |
Average Number of Answers
Dataset | 1p | 2p | 3p | 2i | 3i | ip | pi | 2u | up | 2in | 3in | inp | pin | pni |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FB15k | 1.7 | 19.6 | 24.4 | 8.0 | 5.2 | 18.3 | 12.5 | 18.9 | 23.8 | 15.9 | 14.6 | 19.8 | 21.6 | 16.9 |
FB15k237 | 1.7 | 17.3 | 24.3 | 6.9 | 4.5 | 17.7 | 10.4 | 19.6 | 24.3 | 16.3 | 13.4 | 19.5 | 21.7 | 18.2 |
NELL995 | 1.6 | 14.9 | 17.5 | 5.7 | 6.0 | 17.4 | 11.9 | 14.9 | 19.0 | 12.9 | 11.1 | 12.9 | 16.0 | 13.0 |
Introduced in Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings, ICLR 2020.
EPFO queries are considered easier than BetaE datasets. Doesn't have queries with negations.
Graphs
Dataset | Entities | Relations | Training Edges | Validation Edges | Test Edges | Total Edges |
---|---|---|---|---|---|---|
FB15k | 14,951 | 1,345 | 483,142 | 50,000 | 59,071 | 592,213 |
FB15k237 | 14,505 | 237 | 272,115 | 17,526 | 20,438 | 310,079 |
NELL995 | 63,361 | 200 | 114,213 | 14,324 | 14,267 | 142,804 |
Queries
Queries | Training | Training | Validation | Validation | Test | Test |
---|---|---|---|---|---|---|
Dataset | 1p | others | 1p | others | 1p | others |
FB15k | 273,710 | 273,710 | 59,097 | 8,000 | 67,016 | 8,000 |
FB15k237 | 149,689 | 149,689 | 20,101 | 5,000 | 22,812 | 5,000 |
NELL995 | 107,982 | 107,982 | 16,927 | 4,000 | 17,034 | 4,000 |
Average Number of Answers
Dataset | 1p | 2p | 3p | 2i | 3i | ip | pi | 2u | up |
---|---|---|---|---|---|---|---|---|---|
FB15k | 10.8 | 255.6 | 250.0 | 90.3 | 64.1 | 593.8 | 190.1 | 27.8 | 227.0 |
FB15k237 | 13.3 | 131.4 | 215.3 | 69.0 | 48.9 | 593.8 | 257.7 | 35.6 | 127.7 |
NELL995 | 8.5 | 56.6 | 65.3 | 30.3 | 15.9 | 310.0 | 144.9 | 14.4 | 62.5 |
GQE-like patterns mined on subsets of DBpedia and Wikidata. The datasets are DB18 and WikiGeo19, introduced in CGA, K-CAP 2019.
As of Sept 2022: not available.
Queries emulating property paths in SPARQL with variable length of relation paths (up to length 5). Queries are EPFO queries, i.e., no negation. New operators over relations resemble those from SPARQL:
-
$r_1 / r_2 / \dots$ - relational path, aka classic projection queries -
$r_1 \lor r_2$ - a union of decomposed patterns$(e, r_1, ?) \lor (e, r_2, ?)$ - Kleene plus
$r^{+}$ - one or more occurence of relation$r$ , eg,$r_1/r_2^{+}$ corresponds to$r_1 / r_2$ ,$r_1 / r_2 / r_2$ ,$r_1 / r_2 / r_2 / r_2 / \dots$ up to some final depth. Those can be cyclic patterns.
Two datasets:
- FB15k-Regex is based on Freebase, queries have less than 50 answers, 21 query types
- Wiki100-Regex is based on query logs from the official Wikidata SPARQL endpoint, 5 query types.
Introduced in RotatE-Box, AKBC 2021.
Repo: GitHub - no actual data dumps are present :(
Graphs
Dataset | Entities | Relations | Training Edges | Validation Edges | Test Edges | Total Edges |
---|---|---|---|---|---|---|
FB15k | 14,951 | 1,345 | 483,142 | 50,000 | 59,071 | 592,213 |
Wiki100 | 41,291 | 100 | 389,795 | 21,655 | 21,656 | 433,106 |
Queries
FB15k-Regex
Query type | Train | Valid | Test |
---|---|---|---|
24,476 | 4,614 | 8,405 | |
25,378 | 4,927 | 8,844 | |
26,391 | 4,978 | 9,028 | |
25,470 | 4,878 | 8,816 | |
26,335 | 5,007 | 9,062 | |
27,614 | 5,229 | 9,429 | |
27,865 | 5,283 | 9,509 | |
26,366 | 5,058 | 9,159 | |
26,366 | 5,045 | 9,099 | |
26,703 | 5,155 | 9,313 | |
28,005 | 5,380 | 9,688 | |
27,884 | 5,338 | 9,632 | |
30,080 | 5,828 | 9,664 | |
31,559 | 6,606 | 10,974 | |
41,886 | 7,755 | 13,611 | |
23,109 | 4,469 | 8,367 | |
27,658 | 5,738 | 9,711 | |
24,462 | 4,865 | 8,863 | |
27,676 | 5,340 | 9,267 | |
28,542 | 5,475 | 9,436 | |
26,260 | 5,523 | 10,360 | |
Total | 580,085 | 112,491 | 200,237 |
Wiki100-Regex
Query type | Train | Valid | Test |
---|---|---|---|
490,562 | 24,878 | 23,443 | |
6,945 | 620 | 772 | |
85,253 | 10,013 | 8,377 | |
274,012 | 14,900 | 14,915 | |
348,274 | 15,720 | 15,311 | |
Total | 1,205,046 | 66,131 | 62,818 |
Conjunctive queries (w/o union) not limited to 9 patterns from Query2Box/BetaE datasets. The task is to predict all intermediate entities, not just final leaf nodes. Query depth: 2-5; max 3 intersecting branches.
Introduced in Answering complex queries in knowledge graphs with bidirectional sequence encoders, AAAI 2021.
New FB15K-237-CQ and WN18RR-CQ datasets have two variations:
- CQ (conjunctive queries) - Training on triples + paths + DAGs, Validation/Test on DAGs only;
- Paths - Training on triples + paths, Validation/Test on paths only
Sept 2022: the datasets are not publicly available.
Graphs
Dataset | FB15K-237-CQ | FB15K-237-CQ | FB15K-237-CQ | WN18RR-CQ | WN18RR-CQ | WN18RR-CQ |
---|---|---|---|---|---|---|
Dataset | Train | Validation | Test | Train | Validation | Test |
Entities | 14,505 | - | - | 40,943 | - | - |
Relations | 237 | 237 | 237 | 11 | 11 | 11 |
Triples | 272,115 | - | - | 86,835 | - | - |
Paths | 50,000 | - | - | 10,000 | - | - |
DAGs | 48,865 | 2,785 | 2,599 | 9,465 | 112 | 95 |
Avg Masks | 1.86 | 5.91 | 6.05 | 1.84 | 5.13 | 4.91 |
Avg Query Len (Tokens) | 152 | 460 | 479 | 71 | 198 | 199 |
Queries
No detailed breakdown by query type is available, only the DAGs stats from the main table.
Dataset | FB15K-237-CQ | FB15K-237-CQ | FB15K-237-CQ | WN18RR-CQ | WN18RR-CQ | WN18RR-CQ |
---|---|---|---|---|---|---|
Dataset | Train | Validation | Test | Train | Validation | Test |
Paths | 50,000 | - | - | 10,000 | - | - |
DAGs | 48,865 | 2,785 | 2,599 | 9,465 | 112 | 95 |
Avg Masks | 1.86 | 5.91 | 6.05 | 1.84 | 5.13 | 4.91 |
Avg Query Len (Tokens) | 152 | 460 | 479 | 71 | 198 | 199 |
Existential First-Order queries with Single Free Variable, extended from BetaE. The goal is to evaluate the combinatorial generalizability.
Introduced in Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs, NeurIPS 2021 (Datasets and Benchmarks)
Graphs
Queries | Training | Training | Validation | Validation | Test | Test |
---|---|---|---|---|---|---|
Dataset | 1p/2p/3p/2i/3i | 2in/3in/inp/pin/pni | 1p | others | 1p | others |
FB15k | 273,710 | 27,371 | 59,097 | 8,000 | 67,016 | 8,000 |
FB15k237 | 149,689 | 14,968 | 20,101 | 5,000 | 22,812 | 5,000 |
NELL995 | 107,982 | 10,798 | 16,927 | 4,000 | 17,034 | 4,000 |
Queries
Cannot list all the 301 query types. Details can be found in a summarization excel file here.
Rethinking the EFO-1 formulation by introducing leaf nodes, multi edge, and cycle.
For standard FB15k, FB15k-237, and NELL - 9 new query types (10 with reworked pni
type) including:
l
- queries with existentially quantified variables as leaf nodes (2il, 3il)m
- queries with multiple relation projection edges from one variable to another (2m, 2nm, 3mp, 3pm, im)c
- queries with cycles (3c, cm)
All new query have 5000 instances in three KGs. Introduced in On Existential First Order Queries Inference on Knowledge Graphs, arxiv 2023. The dataset can be downloaded from here.
Based on a variation of the FB15k-237 dataset with entity attributes (12,390 entities, 237 relations, 115 attributes, 29,229 (?) triples). Literals are restricted to numerical values, three additional filter functions (less than, equal, greater then).
The dataset includes standard 9 EPFO query types and adds 8 more variations of those patterns enriched with literals:
- 5 query types where literals are in queries, but the answer is an entity (ai, 2ai, pai, aip, au)
- 3 query types where literals are in queries, and the answer is a mean of relevant literal values (1ap, 2ap, 3ap)
Introduced in LitCQD: Multi-Hop Reasoning in Incomplete Knowledge Graphs with Numeric Literals, arxiv 2023
Existential First-Order queries aimed at evaluating compositional generalization to OOD query patterns (29 in-distribution types, 29 out-of-distribution). In contrast to BetaE datasets, does not have restrictions on the number of answers per query, long tails are possible.
Introduced in Sequential Query Encoding For Complex Query Answering on Knowledge Graphs
Graphs
Queries | Training | Training | Validation | Test |
---|---|---|---|---|
Dataset | 1p | others | all | all |
FB15k | 273,710 | 821,130 | 8,000 | 8,000 |
FB15k237 | 149,689 | 449,067 | 5,000 | 5,000 |
NELL995 | 107,982 | 323,946 | 4,000 | 4,000 |
Queries
58 query types, refer to Appendix A in the paper for the full list of patterns.
Introduced by (EFOk-CQA) EFOk-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation with 741 query types in total.
Featured by:
- existential first-order queries with more than multiple variables.
- combinatorial space with multi-edge and cyclic queries
The Numerical CQA queries both include entities and typed numerical attribute values.
Introduced by Knowledge Graph Reasoning over Entities and Numerical Values
Graphs
Graphs | Data Split | 1p | 2p | 2i | 3i | pi | ip | 2u | up | All |
---|---|---|---|---|---|---|---|---|---|---|
FB15K | Training | 304,633 | 138,192 | 226,729 | 288,874 | 260,057 | 233,834 | 284,301 | 284,931 | 2,021,551 |
Validation | 8,271 | 15,860 | 23,359 | 28,836 | 25,081 | 22,930 | 29,187 | 29,210 | 182,734 | |
Testing | 7,969 | 15,431 | 23,346 | 28,865 | 24,810 | 22,232 | 29,212 | 29,274 | 181,139 | |
DB15K | Training | 124,851 | 99,698 | 140,427 | 190,413 | 171,353 | 163,687 | 190,364 | 194,244 | 1,275,037 |
Validation | 3,529 | 10,388 | 9,792 | 13,817 | 14,594 | 16,651 | 19,512 | 19,792 | 108,075 | |
Testing | 3,387 | 10,047 | 9,914 | 14,603 | 14,642 | 15,897 | 19,504 | 19,773 | 107,767 | |
YAGO15K | Training | 84,014 | 76,238 | 136,282 | 183,850 | 162,712 | 145,994 | 183,963 | 183,459 | 1,156,512 |
Validation | 2,833 | 7,986 | 10,757 | 16,884 | 13,485 | 13,899 | 18,444 | 19,105 | 103,393 | |
Testing | 2,713 | 7,949 | 10,935 | 17,171 | 13,481 | 13,526 | 18,433 | 18,997 | 103,205 |
In addition to a normal graph of entities (instances) a-la BetaE datasets, the type-aware datasets offer an additional set of classes, classes hierarchy (from a pre-existing ontology), and instanceOf
links between entities and classes.
Those datasets might include an additional task of predicting types of answer entities (Concept Retrieval).
- LUBM, introduced in Neuro-Symbolic Ontology-Mediated Query Answering, OpenReview 2021
- NELL, introduced in Neuro-Symbolic Ontology-Mediated Query Answering, OpenReview 2021. The base graph is the same as in the BetaE datasets, but a few ontological axioms were added.
- YAGO 4, introduced in TAR: Neural Logical Reasoning across TBox and ABox
- DBpedia, introduced in TAR: Neural Logical Reasoning across TBox and ABox
LUBM and NELL employ ontological axioms of the DL-Lite (R) family of Description Logics.
Graphs
TODO Figure out Concept Retrieval edges in TAR
Dataset | Entities | Relations | Axioms | Base Graph | Materialized Graph |
---|---|---|---|---|---|
LUBM | 55,684 | 28 | 68 | 284k | 565k |
NELL | 63,361 | 400 | 307 | 285k | 497k |
Axioms breakdown in ontologies for LUBM and NELL
Rules | LUBM | NELL |
---|---|---|
|
68 | 307 |
|
13 | - |
5 | 92 | |
28 | 215 | |
11 | - | |
11 | - |
Dataset | Entities | Relations | Classes | Training Edges | Validation Edges | Test Edges | Entity-Class Edges | Class Hierarchy Edges | Total Edges |
---|---|---|---|---|---|---|---|---|---|
YAGO 4 | 32,465 | 75 | 8,382 | 101,417 | 1,000 | 1,000 | 83,291 | 16,644 | 184,708 |
DBpedia | 28,824 | 327 | 981 | 136,821 | 1,000 | 1,000 | 225,436 | 2,582 | 362,257 |
Queries
Dataset | Train / Test | 1p | 2p | 3p | 2i | 3i | ip | pi | 2u | up |
---|---|---|---|---|---|---|---|---|---|---|
LUBM | Plain (Train) | 110,000 | 110,000 | 110,000 | 110,000 | 110,000 | - | - | - | - |
LUBM | Generalized (Train) | 117,124 | 136,731 | 150,653 | 181,234 | 208,710 | - | - | - | - |
LUBM | Specialized (Train) | 117,780 | 154,851 | 173,678 | 271,532 | 230,085 | - | - | - | - |
LUBM | Ontological (Train) | 116,893 | 166,159 | 333,406 | 212,718 | 491,707 | - | - | - | - |
LUBM | Induction (w/ missing links in queries) (Val/Test) | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 |
LUBM | Deduction (w/o missing link in training) (Val/Test) | 1,241 | 4,701 | 6,472 | 3,829 | 4,746 | 7,393 | 7,557 | 4,986 | 7,122 |
LUBM | Induction + Deduction (Val/Test) | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 | 7,986 | 8,000 |
NELL | Plain (Train) | 107,982 | 107,982 | 107,982 | 107,982 | 107,982 | - | - | - | - |
NELL | Generalized (Train) | 174,310 | 408,842 | 864,268 | 398,412 | 930,787 | - | - | - | - |
NELL | Specialized (Train) | 174,310 | 419,664 | 906,609 | 401,954 | 936,537 | - | - | - | - |
NELL | Ontological (Train) | 114,614 | 542,923 | 864,268 | 629,144 | 930,787 | - | - | - | - |
NELL | Induction (w/ missing links in queries) (Val/Test) | 15,688 | 3,910 | 3,918 | 3,828 | 3,786 | 3,932 | 3,895 | 3,940 | 3,966 |
NELL | Deduction (w/o missing link in training) (Val/Test) | 346 | 4,461 | 4,294 | 4,842 | 5,996 | 7,295 | 5,862 | 5,646 | 6,894 |
NELL | Induction + Deduction (Val/Test) | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 | 8,000 | 7,990 | 8,000 |
Queries | Training | Training | Validation | Validation | Test | Test |
---|---|---|---|---|---|---|
Dataset | 1p | others | 1p | others | 1p | others |
YAGO 4 (Concept Retrieval) | 189,338 | 10,000 | 1,000 | 1,000 | 1,000 | 1,000 |
YAGO 4 (Entity Only) | 101,417 | 10,000 | 1,000 | 1,000 | 1,000 | 1,000 |
YAGO 4 (Entity + Instantiations) | 184,708 | 10,000 | 1,000 | 1,000 | 1,000 | 1,000 |
DBpedia (Concept Retrieval) | 473,924 | 10,000 | 1,000 | 1,000 | 1,000 | 1,000 |
DBpedia (Entity Only) | 136,821 | 10,000 | 1,000 | 1,000 | 1,000 | 1,000 |
DBpedia (Entity + Instantiations) | 362,257 | 10,000 | 1,000 | 1,000 | 1,000 | 1,000 |
Introduced in SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs, KDD 2022.
Training queries are sampled on-the-fly during training due to the huge size of underlying graphs.
The underlying graphs are FB400k (400K nodes), WikiKG 2 (2.5M nodes) (from OGB), and full Freebase (86M nodes) TODO: confirm with Hongyu the number of validation / test queries.
Graphs
Dataset | Entities | Relations | Training Edges | Validation Edges | Test Edges | Total Edges |
---|---|---|---|---|---|---|
FB400k | 409,829 | 918 | 1,075,837 | 537,917 | 537,917 | 2,151,671 |
WikiKG2 | 2,500,604 | 535 | 16,109,182 | 429,456 | 598,543 | 17,137,181 |
Freebase | 86,054,361 | 14,824 | 304,727,650 | 16,929,318 | 16,929,308 | 338,586,276 |
Queries
Queries | Validation | Validation | Test | Test |
---|---|---|---|---|
Dataset | 1p | others | 1p | others |
FB400k | TODO | TODO | TODO | TODO |
WikiKG2 | TODO | TODO | TODO | TODO |
Freebase | TODO | TODO | TODO | TODO |
The main difference of hyper-relational datasets is that edges are no longer plain triples (Albert Einstein, educated at, ETH Zurich, (degree, Bachelor))
, the main triple is Albert Einstein, educated at, ETH Zurich
and its unique qualifier is (degree, Bachelor)
.
Qualifiers provide an additional context to the edge - the tail node might change with another qualifier, e.g., (Albert Einstein, educated at, University of Zurich, (degree, Doctorate))
.
Entities and relations in qualifiers are still legit entities and relations which could be present in main triples. Some entities and relations can be found only in qualifiers.
The WD50K dataset has only conjunctive queries (projection + intersection), neither union nor negation.
Introduced in Query Embedding on Hyper-Relational Knowledge Graphs, ICLR 2022
The WD50K-NFOL dataset introduced in NQE: N-ary Query Embedding for Complex Query Answering over Hyper-relational Knowledge Graphsadds unions and negations, as well as possibility of variables at qualifier entity positions. **As of Nov 2022, not openly available)
Graph
The original WD50K graph from the StarE paper by Galkin et al.
Dataset | Entities | Relations | Qualifier-only Entities | Qualifier-only Relations | Training Edges | Validation Edges | Test Edges | Total Edges |
---|---|---|---|---|---|---|---|---|
WD50K | 47,156 | 532 | 5460 | 45 | 166,435 | 23,913 | 46,159 | 236,508 |
32,167 edges have at least one key-value (relation:entity) qualifier.
Queries
Split | 1p | 2p | 3p | 2i | 3i | ip | pi |
---|---|---|---|---|---|---|---|
train | 24,819 | 313,088 | 5,950,990 | 48,513 | 318,735 | 306,022 | 1,088,539 |
validation | 4,100 | 100,706 | 2,968,315 | 15,648 | 169,195 | 169,438 | 569,957 |
test | 7,716 | 202,045 | 6,433,476 | 38,207 | 547,272 | 445,007 | 1,267,452 |
WD50K-NFOL stats are not yet available
As of March 2023, there are no existing purely inductive datasets such that the training and validation/test graphs are different (validation and test containing new unseen entities) and predictions should only rely on the graph structure w/o external data.
As a bridge between shallow transductive models and inductive inference, Type-aware Embeddings for Multi-Hop Reasoning over Knowledge Graphs propose to mine entity types as the invariant that remains the same at training and inference.
As a result, the following datasets assume an existing and known in advance class hierarchy (or a graph of classes). Technically, those can be put in the Type-Aware Datasets category. The query datasets only have EPFO queries (no negation).
Inductive splits have been published, see the GitHub issue
Graphs
The underlying graphs are FB15k-237-V2 and NELL995-V3 from Inductive relation prediction by subgraph reasoning by Teru et al, ICML 2020. The original repo and other datasets are here.
Training is performed on the Train Graph, but at validation/test time the model is fed with a new Inference Graph with completely new nodes. The Inference Graph has missing edges that have to be predicted at validation or test time.
Dataset | Relations | Types | Train Graph | Train Graph | Inference Graph | Inference Graph | Inference Graph | Inference Graph |
---|---|---|---|---|---|---|---|---|
Train Entities | Train Edges | Inference Entities | Inference Edges | Validation Edges | Test Edges | |||
FB15k-237-V2 | 203 | 3851 | 3,000 | 4,245 | 2,000 | 4,145 | 469 | 478 |
NELL995-V3 | 142 | 267 | 4,647 | 16,393 | 4,921 | 8.048 | 811 | 809 |
The type hierarchy created for those datasets remains unknown.
Queries
Queries | Training | Validation | Validation | Test | Test |
---|---|---|---|---|---|
Dataset | 1p/2p/3p/2i/3i | 1p | others | 1p | others |
FB15k-237-V2 | 9,964 | 1,738 | 2,000 | 791 | 1,000 |
NELL995-V3 | 12,010 | 2,197 | 2,000 | 1,167 | 1,500 |
The dataset proposed in GNNQ frames query answering as node classification. The dataset has 9 tree-like conjunctive queries (6 synthetic from WatDiv and 3 from FB15k237), no unions nor negations. For each query, there are P KGs with an answer entity satisfying a query and N KGs with negative samplies where an answer does not satisfy a query. Test splits have graphs with new entities (but the same query shapes).
Graphs
Many - each WatDiv query has 2K positive GRAPHS and 700K negative GRAPHS (each of about 100K triples); each FB15k237 query has about 1K positive GRAPHS and 1K negative GRAPHS (each of about 10K triples)Queries
Each query in the table has many associated graphs where one node is an answer (positive graph sample) and where nodes are not answers (negative graph samples)
Query | Relations | Num atoms / tree depth | Train: pos/neg | Test: pos/neg |
---|---|---|---|---|
WatDiv-Q1 | 158 | 8 / 4 | 2114 / 699699 | 1085 / 349877 |
WatDiv-Q2 | 158 | 8 / 3 | 3258 / 698396 | 1769 / 349119 |
WatDiv-Q3 | 158 | 8 / 3 | 1520 / 700276 | 798 / 350165 |
WatDiv-Q4 | 158 | 10 / 4 | 2397 / 698986 | 1226 / 349546 |
WatDiv-Q5 | 158 | 10 / 4 | 6338 / 693988 | 2866 / 347570 |
WatDiv-Q6 | 158 | 10 / 4 | 7545 / 692439 | 3744 / 346290 |
FB15k237-Q1 | 237 | 7 / 4 | 1185 / 1180 | 395 / 395 |
FB15k237-Q2 | 237 | 7 / 4 | 650 / 660 | 220 / 220 |
FB15k237-Q3 | 237 | 5 / 4 | 860 / 870 | 290 / 290 |
Introduced in TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph, KDD 2022.
Based on FOL operators, the dataset focuses on temporal reasoning, which includes after
, before
and between
on any timestamp set.
Graphs
Dataset | Entities | Relations | Timestamps | Training Edges | Validation Edges | Test Edges | Total Edges |
---|---|---|---|---|---|---|---|
ICEWS14 | 7,128 | 230 | 365 | 72,826 | 8,941 | 8,963 | 90,730 |
ICEWS05-15 | 10,488 | 251 | 4,017 | 386,962 | 46,275 | 46,092 | 479,329 |
GDELT-500 | 500 | 20 | 366 | 2,735,685 | 341,961 | 341,961 | 3,419,607 |
Queries
Query Name | ICEWS14-Train | Validation | Test | ICES05-15-Train | Validation | Test | GDELT-500-Train | Validation | Test |
---|---|---|---|---|---|---|---|---|---|
Pe2 | 72826 | 3482 | 4037 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |
Pe3 | 72826 | 3492 | 4083 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |
Pe_Pt | 7282 | 3385 | 3638 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
e2i | 72826 | 3305 | 3655 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |
e3i | 72826 | 2966 | 3023 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |
e2i_Pe | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |
Pe_e2i | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |
Pe_t2i | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |
e2i_NPe | 7282 | 3061 | 3192 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
e2i_peN | 7282 | 2971 | 3031 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
Pe_e2i_Pe_NPe | 7282 | 2968 | 3012 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
e2i_N | 7282 | 2949 | 2975 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
e3i_N | 7282 | 2913 | 2914 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
e2u | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |
Pe_e2u | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |
Pt_lPe | 7282 | 4976 | 5608 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
Pt_rPe | 7282 | 3321 | 3621 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
t2i | 72826 | 5112 | 6631 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |
t3i | 72826 | 3094 | 3296 | 368962 | 10000 | 10000 | 2215309 | 10000 | 10000 |
t2i_Pe | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |
Pt_le2i | 7282 | 3226 | 3466 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
Pt_re2i | 7282 | 3236 | 3485 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
t2i_NPt | 7282 | 4873 | 5464 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
t2i_PtN | 7282 | 3300 | 3609 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
Pe_t2i_PtPe_NPt | 7282 | 3031 | 3127 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
t2i_N | 7282 | 3135 | 3328 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
t3i_N | 7282 | 2924 | 2944 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
t2u | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |
Pe_t2u | - | 2913 | 2913 | - | 10000 | 10000 | - | 10000 | 10000 |
Pe_aPt | 7282 | 4134 | 4733 | 68262 | 10000 | 10000 | 221530 | 10000 | 10000 |
Pe_bPt | 7282 | 3970 | 4565 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
Pe_at2i | 7282 | 4607 | 5338 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
Pe_bt2i | 7282 | 4583 | 5386 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
between | 7282 | 2913 | 2913 | 36896 | 10000 | 10000 | 221530 | 10000 | 10000 |
- Graph Query Sampler: Not a method, rather a dataset generator
- EFO-1-QA-benchmark: Generating combiantorial tree-formed query types and sampling the data.
- EFOk-CQA: Generating combinatorial existential first order query types with multiple (k) variables and sampling the data.
- KGReasoning: GQE, Query2Box, BetaE
- CQD: GQE, Query2Box, BetaE, CQD
- EFO-1-QA-benchmark: Query2Box, BetaE, LogicE, NewLook, ConE, FuzzQE
- Query2particles
- StarQE: StarQE
- SMORE: GQE, Query2Box, BetaE + Very Large Datasets
- GNN-QE: GNN-QE
- InductiveQE: Inductive QE with NodePiece and GNN-QE
- TAR: TAR
- QE-TeMP: TeMP (based on KGReasoning)
- GNNQ: GNNQ
- SE-KGE: GQE, CGA, and geospatial model
- LARK: LARK (uses Huggingface LLMs)
- WFRE: WFRE
- FIT: FIT
- SQE: SQE with Transformer/LSTM/GRU/TCN, Tree-LSTM, Tree-RNN, BetaE, BiQE, ConE, FuzzQE, GQE, HypE, NerualMLP (Mixer), Query2Box, Query2Particles
- NRN: NRN with GQE, Query2Box, Query2Particles
- EFOk-CQA: EFOk
Click to expand
- (GQE) Embedding Logical Queries on Knowledge Graphs NeurIPS 2018
- (GQE + hashing) Learning to Hash for Efficient Search over Incomplete Knowledge Graphs ICDM 2019
- (CGA) Contextual Graph Attention for Answering Logical Queries over Incomplete Knowledge Graphs K-CAP 2019, GQE + self-attention instead of DeepSet
- (TractOR) Symbolic querying of vector spaces: Probabilistic databases meets relational embeddings, UAI 2020
- (Query2Box) Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings ICLR 2020
- (BetaE) Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs NeurIPS 2020
- (EmQL) Faithful embeddings for knowledge base queries NeurIPS 2020
- (MPQE) Message Passing Query Embedding ICML’20 Workshop
- (RotatE-Box)Regex Queries over Incomplete Knowledge Bases AKBC’21
- (BiQE) Answering complex queries in knowledge graphs with bidirectional sequence encoders, AAAI’21
- Approximate knowledge graph query answering: from ranking to binary classification
- Knowledge Sheaves: A Sheaf-Theoretic Framework for Knowledge Graph Embedding arxiv, 2021
- (ConE) Cone: Cone embeddings for multi-hop reasoning over knowledge graphs NeurIPS’21
- (PERM) Probabilistic entity representation model for reasoning over knowledge graphs (improv over BetaE) NeurIPS’21
- (CQD) Complex Query Answering with Neural Link Predictors ICLR’21
- (HypE) Self-Supervised Hyperboloid Representations from Logical Queries over Knowledge Graphs, WWW 2021
- (NewLook) Neural-Answering Logical Queries on Knowledge Graphs (KDD’21)
- Benchmarking the Combinatorial Generalizability of Complex Query Answering on Knowledge Graphs, NeurIPS 2021 (Datasets and Benchmarks)
- Neuro-Symbolic Ontology-Mediated Query Answering OpenReview 2021
- (LogicE) Logic Embeddings for Complex Query Answering arxiv 2021
- (StarQE) Query Embedding on Hyper-relational Knowledge Graphs ICLR 2022,
- (MLPMix) Neural Methods for Logical Reasoning over Knowledge Graphs ICLR 2022
- (FuzzQE) Fuzzy Logic Based Logical Query Answering on Knowledge Graphs, AAAI 2022
- (GNN-QE) Neural-Symbolic Models for Logical Queries on Knowledge Graphs, ICML 2022
- (SMORE) SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs KDD 2022
- (kgTransformer) Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries KDD 2022
- (Query2Particles) Query2Particles: Knowledge Graph Reasoning with Particle Embeddings, Findings NAACL’22
- (TAR) TAR: Neural Logical Reasoning across TBox and ABox (arxiv, 2022)
- (TeMP) Type-aware embeddings for multi-hop reasoning over knowledge graphs (IJCAI-ECAI 2022)
- (FLEX) FLEX: Feature-Logic Embedding Framework for CompleX Knowledge Graph Reasoning (arxiv 2022)
- (TFLEX) TFLEX: Temporal Feature-Logic Embedding Framework for Complex Reasoning over Temporal Knowledge Graph (arxiv, 2022)
- (LinE) LinE: Logical Query Reasoning over Hierarchical Knowledge Graphs KDD 2022
- GNNQ: A Neuro-Symbolic Approach for Query Answering over Incomplete Knowledge Graphs ISWC 2022
- (ENeSy) Neural-Symbolic Entangled Framework for Complex Query Answering NeurIPS 2022
- (NodePiece-QE, InductiveQE) Inductive Logical Query Answering in Knowledge Graphs NeurIPS 2022
- (RoMA) Reasoning over Multi-view Knowledge Graphs arxiv 2022, some new datasets, but no code/data published
- (LMPNN) Logical Message Passing Networks With One-Hop Inference On Atomic Formulas ICLR'23
- (GammaE) GammaE: Gamma Embeddings for Logical Queries on Knowledge Graphs EMNLP 2022
- (NMP-QEM) Neural-based Mixture Probabilistic Query Embedding for Answering FOL queries on Knowledge Graphs, EMNLP 2022
- (NQE) NQE: N-ary Query Embedding for Complex Query Answering over Hyper-relational Knowledge Graphs AAAI 2023
- (QTO) Answering Complex Logical Queries on Knowledge Graphs via Query Computation Tree Optimization, ICML'23 submission
- (SignalE) Signal Embeddings for Complex Logical Reasoning in Knowledge Graphs, KSEM'22
- (Var2Vec) Efficient Embeddings of Logical Variables for Query Answering over Incomplete Knowledge Graphs, AAAI'23
- (CQD-A) Adapting Neural Link Predictors for Complex Query Answering
- (Query2Geom) Analysis of Attention Mechanisms in Box-Embedding Systems, 2023
- (SQE) Sequential Query Encoding For Complex Query Answering on Knowledge Graphs, TMLR 2023
- (CylE) CylE: Cylinder Embeddings for Multi-hop Reasoning over Knowledge Graphs, EACL 2023
- (RoConE) Modeling Relational Patterns for Logical Query Answering over Knowledge Graphs
- (FIT) On Existential First Order Queries Inference on Knowledge Graphs, arxiv 2023
- (LitCQD) LitCQD: Multi-Hop Reasoning in Incomplete Knowledge Graphs with Numeric Literals, arxiv 2023
- (LARK) Complex Logical Reasoning over Knowledge Graphs using Large Language Models, arxiv 2023
- (WFRE) Wasserstein-Fisher-Rao Embedding: Logical Query Embeddings with Local Comparison and Global Transport, arxiv 2023
- (NRN) Knowledge Graph Reasoning over Entities and Numerical Values KDD 2023
- (EFOk-CQA) EFOk-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation arxiv 2023
Click to expand
- (SE-KGE) SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic Lifting, Transactions in GIS 2020, GQE with scalar (x,y) coordinate prediction / encoding
- (LEGO) Lego: Latent execution-guided reasoning for multi-hop question answering on knowledge graphs, ICML 2021
- (CBR-SubG) Knowledge base question answering by case-based reasoning over subgraphs ICML 2022, application to Question Answering, entailment only, custom datasets
- (LogiRec) Towards High-Order Complementary Recommendation via Logical Reasoning Network Application: BetaE in RecSys, arxiv 2022
- Context-aware explainable recommendation based on domain knowledge graph, Big Data and Cognitive Computing, 2022
- (PLM4CLQA) Unifying Structure Reasoning and Language Model Pre-training for Complex Reasoning, arxiv 2023
- Unifying structure reasoning and language model pre-training for complex reasoning, arxiv 2023
Click to expand
- Hybrid Structured and Similarity Queries over Wikidata plus Embeddings with Kypher-V, ISWC 2022
- Combining RDF Graph Data and Embedding Models for an Augmented Knowledge Graph, BigNet 2018 Workshop @ WWW'18
- TrQuery: An Embedding-based Framework for Recommanding SPARQL Queries, 2018
- Towards Empty Answers in SPARQL: Approximating Querying with RDF Embedding, ISWC 2018
If you find this work useful, please cite the original paper:
@article{ren2023ngdb,
title={Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases},
author={Hongyu Ren and Mikhail Galkin and Michael Cochez and Zhaocheng Zhu and Jure Leskovec},
year={2023},
eprint={2303.14617},
archivePrefix={arXiv},
}