- A simple "ground-truth" example of converting text from the report to RDF triples;
- Zero-shot and few-shot (one-shot) examples of converting text to RDF triples with LLMs (OpenAI GPT);
- Defining concepts and matching them with the AGROVOC thesaurus;
- When a concept matches the "prefLabel" of a concept in the AGROVOC, the URI defined by the AGROVOC is used directly;
- When a concept matches the "altLabel" of a concept in the AGROVOC, using "closeMatch" to interlink with the AGROVOC.
- Importing Agrontology to better express relationships between concepts;
- Visualizing the knowledge graph;
- Evaluating different conversion strategies using the F1 score and exact match;
- Post-processing the knowledge graph to eliminate misuse of ontologies;
- Interlinking to external databases, such as soil-related metadata records harvested from Zenodo, by keyword matching;
- Filling in keywords extracted from the title and description for metadata records that are missing keywords;
- Introducing a quad store with named graphs to store extracted keywords in the named graph called "augmented";
- Storing the soil health knowledge graph as the default graph and raw metadata records in the named graph "metadata".
- Validating the (expanded) knowledge graph by question-answering using NLQ.
See Issues.
The concept of soil health lacks a universally agreed-upon definition and can vary in interpretation in the context of research versus policymaking. However, within the literature, many factors and indicators are cited as measures of soil health, as highlighted in this European Environment Agency's (EEA) report. Thus, it is advantageous to extract the soil health concepts outlined in this report into a knowledge graph, facilitating systematic organization of these knowledge for machine interpretation and allowing for integration with other knowledge repositories. Such a resource would aid users—be they farmers, policymakers, or researchers—in efficiently accessing relevant information, encompassing factors influencing soil health, associated indicators, and their respective normal ranges.
Once we establish this soil health knowledge graph, we aim to enhance it by interlinking a vast array of external data and knowledge to create a soil knowledge repository. A primary method for integrating external data involves keyword matching, where each concept within the graph serves as a keyword. These keywords allow us to search and link metadata from external databases, literature, and other web resources back to the corresponding concepts in the knowledge graph. This process depends on keyword extraction techniques to supplement metadata entries that are missing keywords.
Building this knowledge repository in a top-down manner, guided by the structure of the knowledge graph, offers several advantages. The top-down approach provides a more controllable and orderly construction process compared to bottom-up methods, enhancing the ease of knowledge sharing, including improved reuse of knowledge. Furthermore, because the repository structurally aligns with the knowledge graph, with external data directly linked to its concepts, the repository becomes more functional for applications. For example, understanding the relationships between concepts is crucial when developing a recommender system that operates over the knowledge repository.