Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library Usage #118

Merged
merged 21 commits into from
Aug 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
2df147a
Add validate() function for programmatic access
kchason Jul 25, 2023
f485161
Fix pre-commit formatting
kchason Jul 25, 2023
b509dc5
Fix property reference
kchason Jul 25, 2023
8c40df1
Make type generic to account for multiple return types
kchason Jul 25, 2023
f6d48e2
Fix List vs list for casting
kchason Jul 25, 2023
2d3a65b
Feedback from PR
kchason Jul 26, 2023
bfe9992
Merge branch 'develop' into library-usage
kchason Jul 26, 2023
86ed417
Instantiate properties as instance variables instead of class
kchason Jul 26, 2023
40be311
Fix None vs "none" ontology version specification
ajnelson-nist Aug 15, 2023
ae5f077
Add explicit `-> None` on `__init__`
ajnelson-nist Aug 21, 2023
97d7fbb
Constrain `ValidationResult.graph` type to `pyshacl.validate(...)[1]`…
ajnelson-nist Aug 21, 2023
64bd95c
Merge branch 'develop' into library-usage
kchason Aug 22, 2023
4208a99
Wrap errors and positional arg signature support
kchason Aug 22, 2023
73e4683
Separate types and utils into discrete files
kchason Aug 22, 2023
d41a418
Fix import reference
kchason Aug 22, 2023
8f957dc
Forward arguments with unpacking syntax
ajnelson-nist Aug 23, 2023
6f9b6c9
Merge branch 'develop' into library-usage
ajnelson-nist Aug 23, 2023
00f1360
Default case_validate.validate inference parameter to None rather tha…
ajnelson-nist Aug 23, 2023
15f00c9
Consolidate case_validate CLI validation logic into case_validate.val…
ajnelson-nist Aug 23, 2023
eccaad4
Add new case_validate source files to Make dependencies
ajnelson-nist Aug 23, 2023
90f5c8c
case_validate: Update NIST inlined license text
ajnelson-nist Aug 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
258 changes: 94 additions & 164 deletions case_utils/case_validate/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
#!/usr/bin/env python3

# Portions of this file contributed by NIST are governed by the following
# statement:
#
# This software was developed at the National Institute of Standards
# and Technology by employees of the Federal Government in the course
# of their official duties. Pursuant to title 17 Section 105 of the
# United States Code this software is not subject to copyright
# protection and is in the public domain. NIST assumes no
# responsibility whatsoever for its use by other parties, and makes
# no guarantees, expressed or implied, about its quality,
# reliability, or any other characteristic.
# of their official duties. Pursuant to Title 17 Section 105 of the
# United States Code, this software is not subject to copyright
# protection within the United States. NIST assumes no responsibility
# whatsoever for its use by other parties, and makes no guarantees,
# expressed or implied, about its quality, reliability, or any other
# characteristic.
#
# We would appreciate acknowledgement if the software is used.

Expand All @@ -32,141 +35,105 @@
__version__ = "0.3.0"

import argparse
import importlib.resources
import logging
import os
import sys
import warnings
from typing import Dict, Set, Tuple, Union
from typing import Any, Dict, List, Optional, Tuple, Union

import pyshacl # type: ignore
import rdflib
from rdflib import Graph

import case_utils.ontology
from case_utils.case_validate.validate_types import (
NonExistentCDOConceptWarning,
ValidationResult,
)
from case_utils.case_validate.validate_utils import (
get_invalid_cdo_concepts,
get_ontology_graph,
)
from case_utils.ontology.version_info import (
CURRENT_CASE_VERSION,
built_version_choices_list,
)

NS_OWL = rdflib.OWL
NS_RDF = rdflib.RDF
NS_RDFS = rdflib.RDFS
NS_SH = rdflib.SH

_logger = logging.getLogger(os.path.basename(__file__))


class NonExistentCDOConceptWarning(UserWarning):
def validate(
input_file: Union[List[str], str],
*args: Any,
case_version: Optional[str] = None,
supplemental_graphs: Optional[List[str]] = None,
**kwargs: Any,
) -> ValidationResult:
"""
This class is used when a concept is encountered in the data graph that is not part of CDO ontologies, according to the --built-version flags and --ontology-graph flags.
Validate the given data graph against the given CASE ontology version and supplemental graphs.

:param *args: The positional arguments to pass to the underlying pyshacl.validate function.
:param input_file: The path to the file containing the data graph to validate. This can also be a list of paths to files containing data graphs to pool together.
:param case_version: The version of the CASE ontology to use (e.g. 1.2.0). If None, the most recent version will be used.
:param supplemental_graphs: File paths to supplemental graphs to use. If None, no supplemental graphs will be used.
:param allow_warnings: In addition to affecting the conformance of SHACL validation, this will affect conformance based on unrecognized CDO concepts (likely, misspelled or miscapitalized) in the data graph. If allow_warnings is not True, any unrecognized concept using a CDO IRI prefix will cause conformance to be False.
:param inference: The type of inference to use. If "none" (type str), no inference will be used. If None (type NoneType), pyshacl defaults will be used. Note that at the time of this writing (pySHACL 0.23.0), pyshacl defaults are no inferencing for the data graph, and RDFS inferencing for the SHACL graph, which for case_utils.validate includes the SHACL and OWL graphs.
:param **kwargs: The keyword arguments to pass to the underlying pyshacl.validate function.
:return: The validation result object containing the defined properties.
"""
# Convert the data graph string to a rdflib.Graph object.
data_graph = rdflib.Graph()
if isinstance(input_file, str):
data_graph.parse(input_file)
elif isinstance(input_file, list):
for _data_graph_file in input_file:
_logger.debug("_data_graph_file = %r.", _data_graph_file)
if not isinstance(_data_graph_file, str):
raise TypeError("Expected str, received %s." % type(_data_graph_file))
data_graph.parse(_data_graph_file)

# Get the ontology graph from the case_version and supplemental_graphs arguments
ontology_graph: Graph = get_ontology_graph(case_version, supplemental_graphs)

# Get the undefined CDO concepts.
undefined_cdo_concepts = get_invalid_cdo_concepts(data_graph, ontology_graph)

pass
# Warn about typo'd concepts before performing SHACL review.
for undefined_cdo_concept in sorted(undefined_cdo_concepts):
warnings.warn(undefined_cdo_concept, NonExistentCDOConceptWarning)
undefined_cdo_concepts_message = (
"There were %d concepts with CDO IRIs in the data graph that are not in the ontology graph."
% len(undefined_cdo_concepts)
)

# Validate data graph against ontology graph.
validate_result: Tuple[
bool, Union[Exception, bytes, str, rdflib.Graph], str
ajnelson-nist marked this conversation as resolved.
Show resolved Hide resolved
] = pyshacl.validate(
data_graph,
*args,
ont_graph=ontology_graph,
shacl_graph=ontology_graph,
**kwargs,
)

def concept_is_cdo_concept(n_concept: rdflib.URIRef) -> bool:
concept_iri = str(n_concept)
return concept_iri.startswith(
"https://ontology.unifiedcyberontology.org/"
) or concept_iri.startswith("https://ontology.caseontology.org/")
# Relieve RAM of the data graph after validation has run.
del data_graph

conforms = validate_result[0]

def get_invalid_cdo_concepts(
data_graph: rdflib.Graph, ontology_graph: rdflib.Graph
) -> Set[rdflib.URIRef]:
"""
Get the set of concepts in the data graph that are not part of the CDO ontologies as specified with the ontology_graph argument.

:param data_graph: The data graph to validate.
:param ontology_graph: The ontology graph to use for validation.
:return: The list of concepts in the data graph that are not part of the CDO ontology.

>>> from case_utils.namespace import NS_RDF, NS_OWL, NS_UCO_CORE
>>> from rdflib import Graph, Literal, Namespace, URIRef
>>> # Define a namespace for a knowledge base, and a namespace for custom extensions.
>>> ns_kb = Namespace("http://example.org/kb/")
>>> ns_ex = Namespace("http://example.org/ontology/")
>>> dg = Graph()
>>> og = Graph()
>>> # Use an ontology graph in review that includes only a single class and a single property excerpted from UCO, but also a single custom property.
>>> _ = og.add((NS_UCO_CORE.UcoObject, NS_RDF.type, NS_OWL.Class))
>>> _ = og.add((NS_UCO_CORE.name, NS_RDF.type, NS_OWL.DatatypeProperty))
>>> _ = og.add((ns_ex.ourCustomProperty, NS_RDF.type, NS_OWL.DatatypeProperty))
>>> # Define an individual.
>>> n_uco_object = ns_kb["UcoObject-f494d239-d9fd-48da-bc07-461ba86d8c6c"]
>>> n_uco_object
rdflib.term.URIRef('http://example.org/kb/UcoObject-f494d239-d9fd-48da-bc07-461ba86d8c6c')
>>> # Review a data graph that includes only the single individual, class typo'd (capitalized incorrectly), but property OK.
>>> _ = dg.add((n_uco_object, NS_RDF.type, NS_UCO_CORE.UCOObject))
>>> _ = dg.add((n_uco_object, NS_UCO_CORE.name, Literal("Test")))
>>> _ = dg.add((n_uco_object, ns_ex.customProperty, Literal("Custom Value")))
>>> invalid_cdo_concepts = get_invalid_cdo_concepts(dg, og)
>>> invalid_cdo_concepts
{rdflib.term.URIRef('https://ontology.unifiedcyberontology.org/uco/core/UCOObject')}
>>> # Note that the property "ourCustomProperty" was typo'd in the data graph, but this was not reported.
>>> assert ns_ex.ourCustomProperty not in invalid_cdo_concepts
"""
# Construct set of CDO concepts for data graph concept-existence review.
cdo_concepts: Set[rdflib.URIRef] = set()

for n_structural_class in [
NS_OWL.Class,
NS_OWL.AnnotationProperty,
NS_OWL.DatatypeProperty,
NS_OWL.ObjectProperty,
NS_RDFS.Datatype,
NS_SH.NodeShape,
NS_SH.PropertyShape,
NS_SH.Shape,
]:
for ontology_triple in ontology_graph.triples(
(None, NS_RDF.type, n_structural_class)
):
if not isinstance(ontology_triple[0], rdflib.URIRef):
continue
if concept_is_cdo_concept(ontology_triple[0]):
cdo_concepts.add(ontology_triple[0])
for n_ontology_predicate in [
NS_OWL.backwardCompatibleWith,
NS_OWL.imports,
NS_OWL.incompatibleWith,
NS_OWL.priorVersion,
NS_OWL.versionIRI,
]:
for ontology_triple in ontology_graph.triples(
(None, n_ontology_predicate, None)
):
assert isinstance(ontology_triple[0], rdflib.URIRef)
assert isinstance(ontology_triple[2], rdflib.URIRef)
cdo_concepts.add(ontology_triple[0])
cdo_concepts.add(ontology_triple[2])
for ontology_triple in ontology_graph.triples((None, NS_RDF.type, NS_OWL.Ontology)):
if not isinstance(ontology_triple[0], rdflib.URIRef):
continue
cdo_concepts.add(ontology_triple[0])

# Also load historical ontology and version IRIs.
ontology_and_version_iris_data = importlib.resources.read_text(
case_utils.ontology, "ontology_and_version_iris.txt"
if len(undefined_cdo_concepts) > 0:
warnings.warn(undefined_cdo_concepts_message)
if not kwargs.get("allow_warnings"):
undefined_cdo_concepts_alleviation_message = "The data graph is SHACL-conformant with the CDO ontologies, but nonexistent-concept references raise Warnings with this tool. Please either correct the concept names in the data graph; use the --ontology-graph flag to pass a corrected CDO ontology file, also using --built-version none; or, use the --allow-warnings flag."
warnings.warn(undefined_cdo_concepts_alleviation_message)
conforms = False

return ValidationResult(
conforms,
validate_result[1],
validate_result[2],
undefined_cdo_concepts,
)
for line in ontology_and_version_iris_data.split("\n"):
cleaned_line = line.strip()
if cleaned_line == "":
continue
cdo_concepts.add(rdflib.URIRef(cleaned_line))

data_cdo_concepts: Set[rdflib.URIRef] = set()
for data_triple in data_graph.triples((None, None, None)):
for data_triple_member in data_triple:
if isinstance(data_triple_member, rdflib.URIRef):
if concept_is_cdo_concept(data_triple_member):
data_cdo_concepts.add(data_triple_member)
elif isinstance(data_triple_member, rdflib.Literal):
if isinstance(data_triple_member.datatype, rdflib.URIRef):
if concept_is_cdo_concept(data_triple_member.datatype):
data_cdo_concepts.add(data_triple_member.datatype)

return data_cdo_concepts - cdo_concepts


def main() -> None:
Expand Down Expand Up @@ -263,32 +230,6 @@ def main() -> None:

args = parser.parse_args()

data_graph = rdflib.Graph()
for in_graph in args.in_graph:
_logger.debug("in_graph = %r.", in_graph)
data_graph.parse(in_graph)

ontology_graph = rdflib.Graph()
if args.built_version != "none":
ttl_filename = args.built_version + ".ttl"
_logger.debug("ttl_filename = %r.", ttl_filename)
ttl_data = importlib.resources.read_text(case_utils.ontology, ttl_filename)
ontology_graph.parse(data=ttl_data, format="turtle")
if args.ontology_graph:
for arg_ontology_graph in args.ontology_graph:
_logger.debug("arg_ontology_graph = %r.", arg_ontology_graph)
ontology_graph.parse(arg_ontology_graph)

# Get the list of undefined CDO concepts in the graph
undefined_cdo_concepts = get_invalid_cdo_concepts(data_graph, ontology_graph)

for undefined_cdo_concept in sorted(undefined_cdo_concepts):
warnings.warn(undefined_cdo_concept, NonExistentCDOConceptWarning)
undefined_cdo_concepts_message = (
"There were %d concepts with CDO IRIs in the data graph that are not in the ontology graph."
% len(undefined_cdo_concepts)
)

# Determine output format.
# pySHACL's determination of output formatting is handled solely
# through the -f flag. Other CASE CLI tools handle format
Expand All @@ -299,27 +240,23 @@ def main() -> None:
if args.format != "human":
validator_kwargs["serialize_report_graph"] = args.format

validate_result: Tuple[bool, Union[Exception, bytes, str, rdflib.Graph], str]
validate_result = pyshacl.validate(
data_graph,
shacl_graph=ontology_graph,
ont_graph=ontology_graph,
inference=args.inference,
meta_shacl=args.metashacl,
validation_result: ValidationResult = validate(
args.in_graph,
abort_on_first=args.abort,
allow_infos=True if args.allow_infos else False,
allow_warnings=True if args.allow_warnings else False,
case_version=args.built_version,
debug=True if args.debug else False,
do_owl_imports=True if args.imports else False,
**validator_kwargs
inference=args.inference,
meta_shacl=args.metashacl,
supplemental_graphs=args.ontology_graph,
**validator_kwargs,
)

# Relieve RAM of the data graph after validation has run.
del data_graph

conforms = validate_result[0]
validation_graph = validate_result[1]
validation_text = validate_result[2]
conforms = validation_result.conforms
validation_graph = validation_result.graph
validation_text = validation_result.text

# NOTE: The output logistics code is adapted from pySHACL's file
# pyshacl/cli.py. This section should be monitored for code drift.
Expand All @@ -341,13 +278,6 @@ def main() -> None:
% type(validation_graph)
)

if len(undefined_cdo_concepts) > 0:
warnings.warn(undefined_cdo_concepts_message)
if not args.allow_warnings:
undefined_cdo_concepts_alleviation_message = "The data graph is SHACL-conformant with the CDO ontologies, but nonexistent-concept references raise Warnings with this tool. Please either correct the concept names in the data graph; use the --ontology-graph flag to pass a corrected CDO ontology file, also using --built-version none; or, use the --allow-warnings flag."
warnings.warn(undefined_cdo_concepts_alleviation_message)
conforms = False

sys.exit(0 if conforms else 1)


Expand Down
49 changes: 49 additions & 0 deletions case_utils/case_validate/validate_types.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/usr/bin/env python3

# Portions of this file contributed by NIST are governed by the following
# statement:
#
# This software was developed at the National Institute of Standards
# and Technology by employees of the Federal Government in the course
# of their official duties. Pursuant to Title 17 Section 105 of the
# United States Code, this software is not subject to copyright
# protection within the United States. NIST assumes no responsibility
# whatsoever for its use by other parties, and makes no guarantees,
# expressed or implied, about its quality, reliability, or any other
# characteristic.
#
# We would appreciate acknowledgement if the software is used.

from typing import Set, Union

import rdflib


class ValidationResult:
def __init__(
self,
conforms: bool,
graph: Union[Exception, bytes, str, rdflib.Graph],
text: str,
undefined_concepts: Set[rdflib.URIRef],
) -> None:
self.conforms = conforms
self.graph = graph
self.text = text
self.undefined_concepts = undefined_concepts


class NonExistentCDOConceptWarning(UserWarning):
"""
This class is used when a concept is encountered in the data graph that is not part of CDO ontologies, according to the --built-version flags and --ontology-graph flags.
"""

pass


class NonExistentCASEVersionError(Exception):
"""
This class is used when an invalid CASE version is requested that is not supported by the library.
"""

pass
Loading
Loading