Merge branch 'develop' into reduce_redundant_analysis

casework · Aug 24, 2023 · 63cb72b · 63cb72b
2 parents ce8da08 + 24bdcc7
commit 63cb72b
Show file tree

Hide file tree

Showing 110 changed files with 44,248 additions and 365 deletions.
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -0,0 +1,10 @@
+# This file lists the contributors responsible for the
+# repository content. They will also be automatically
+# asked to review any pull request made in this repository.
+
+# Each line is a file pattern followed by one or more owners.
+# The sequence matters: later patterns take precedence.
+
+# FILES  OWNERS
+*        @casework/maintainers-global
+*        @casework/maintainers-case-python-utilities
diff --git a/.github/workflows/cicd.yml b/.github/workflows/cicd.yml
@@ -23,6 +23,8 @@ on:
   release:
     types:
       - published
+  schedule:
+    - cron: '15 5 * * TUE'
 
 jobs:
   build:
@@ -31,8 +33,8 @@ jobs:
     strategy:
       matrix:
         python-version: 
-          - '3.7'
-          - '3.10'
+          - '3.8'
+          - '3.11'
 
     steps:
     - uses: actions/checkout@v2

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,14 +1,14 @@
 repos:
   - repo: https://github.com/psf/black
-    rev: 22.3.0
+    rev: 23.1.0
     hooks:
       - id: black
   - repo: https://github.com/pycqa/flake8
-    rev: 4.0.1
+    rev: 6.0.0
     hooks:
       - id: flake8
   - repo: https://github.com/pycqa/isort
-    rev: 5.10.1
+    rev: 5.12.0
     hooks:
       - id: isort
         name: isort (python)
diff --git a/CONTRIBUTE.md b/CONTRIBUTE.md
@@ -27,10 +27,12 @@ pushd case_utils/ontology
   git add case-0.6.0.ttl  # Assuming CASE 0.6.0 was just released.
   # and/or
   git add uco-0.8.0.ttl   # Assuming UCO 0.8.0 was adopted in CASE 0.6.0.
+
+  git add ontology_and_version_iris.txt
 popd
 make check
 # Assuming `make check` passes:
-git commit -m "Build CASE 0.6.0 monolithic .ttl files" case_utils/ontology/case-0.6.0-subclasses.ttl case_utils/ontology/case-0.6.0.ttl
+git commit -m "Build CASE 0.6.0 monolithic .ttl files" case_utils/ontology/case-0.6.0-subclasses.ttl case_utils/ontology/case-0.6.0.ttl case_utils/ontology/ontology_and_version_iris.txt
 git commit -m "Update CASE ontology pointer to version 0.6.0" dependencies/CASE case_utils/ontology/version_info.py
 ```
 
@@ -43,4 +45,4 @@ pre-commit --version
 The `pre-commit` tool hooks into Git's commit machinery to run a set of linters and static analyzers over each change. To install `pre-commit` into Git's hooks, run:
 ```bash
 pre-commit install
-```
+```
diff --git a/README.md b/README.md
@@ -55,12 +55,17 @@ To produce the validation report as a machine-readable graph output, the `--form
 case_validate --format turtle input.json > result.ttl
 ```
 
-To use one or more supplementary ontology files, the `--ontology-graph` flag can be used, more than once if desired, to supplement the selected CASE version:
+To use one or more supplementary ontology or shape files, the `--ontology-graph` flag can be used, more than once if desired, to supplement the selected CASE version:
 
 ```bash
-case_validate --ontology-graph internal_ontology.ttl --ontology-graph experimental_shapes.ttl input.json
+case_validate \
+  --ontology-graph internal_ontology.ttl \
+  --ontology-graph experimental_shapes.ttl \
+  input.json
 ```
 
+This tool uses the `--built-version` flag, described [below](#built-versions).
+
 Other flags are reviewable with `case_validate --help`.
 
 
@@ -87,6 +92,8 @@ These commands can be used with any RDF files to run arbitrary SPARQL queries.
 
 Note that prefixes used in the SPARQL queries do not need to be defined in the SPARQL query.  Their mapping will be inherited from their first definition in the input graph files.  However, input graphs are not required to agree on prefix mappings, so there is potential for confusion from input argument order mattering if two input graph files disagree on what a prefix maps to.  If there is concern of ambiguity from inputs, a `PREFIX` statement should be included in the query, such as is shown in [this test query](tests/case_utils/case_sparql_select/subclass.sparql).
 
+These tools use the `--built-version` flag, described [below](#built-versions).
+
 
 #### `case_sparql_construct`
 
@@ -116,6 +123,15 @@ case_sparql_select output.md input.sparql input.json [input-2.json ...]
 This [module](case_utils/local_uuid.py) provides a wrapper UUID generator, `local_uuid()`.  Its main purpose is making example data generate consistent identifiers, and intentionally includes mechanisms to make it difficult to activate this mode without awareness of the caller.
 
 
+### Built versions
+
+Several tools in this package include a flag `--built-version`.  This flag tailors the tool's behavior to a certain CASE ontology version; typically, this involves mixing the ontology graph into the data graph for certain necessary knowledge expansion for pattern matching (such as making queries aware of the OWL subclass hierarchy).
+
+If not provided, the tool will assume a default value of the latest ontology version.
+
+If the special value `none` is provided, none of the ontology builds this package ships will be included in the data graph.  The `none` value supports use cases that are wholly independent of CASE, such as running a test in a specialized vocabulary; and also suports use cases where a non-released CASE version is meant to be used, such as a locally revised version of CASE where some concept revisions are being reviewed.
+
+
 ## Development status
 
 This repository follows [CASE community guidance on describing development status](https://caseontology.org/resources/software.html#development_status), by adherence to noted support requirements.

diff --git a/case_utils/__init__.py b/case_utils/__init__.py
@@ -11,6 +11,6 @@
 #
 # We would appreciate acknowledgement if the software is used.
 
-__version__ = "0.7.0"
+__version__ = "0.11.0"
 
 from . import local_uuid  # noqa: F401
diff --git a/case_utils/case_file/__init__.py b/case_utils/case_file/__init__.py
@@ -15,7 +15,7 @@
 This module creates a graph object that provides a basic UCO characterization of a single file.  The gathered metadata is among the more "durable" file characteristics, i.e. characteristics that would remain consistent when transferring a file between locations.
 """
 
-__version__ = "0.4.0"
+__version__ = "0.5.0"
 
 import argparse
 import datetime
@@ -27,7 +27,7 @@
 
 import rdflib
 
-import case_utils
+import case_utils.inherent_uuid
 from case_utils.namespace import (
     NS_RDF,
     NS_UCO_CORE,
@@ -39,6 +39,7 @@
 
 DEFAULT_PREFIX = "http://example.org/kb/"
 
+
 # Shortcut syntax for defining an immutable named tuple is noted here:
 # https://docs.python.org/3/library/typing.html#typing.NamedTuple
 # via the "See also" box here: https://docs.python.org/3/library/collections.html#collections.namedtuple
@@ -48,6 +49,8 @@ class HashDict(typing.NamedTuple):
     sha1: str
     sha256: str
     sha512: str
+    sha3_256: str
+    sha3_512: str
 
 
 def create_file_node(
@@ -57,6 +60,9 @@ def create_file_node(
     node_prefix: str = DEFAULT_PREFIX,
     disable_hashes: bool = False,
     disable_mtime: bool = False,
+    *args: typing.Any,
+    use_deterministic_uuids: bool = False,
+    **kwargs: typing.Any,
 ) -> rdflib.URIRef:
     r"""
     This function characterizes the file at filepath.
@@ -67,7 +73,7 @@ def create_file_node(
     :param filepath: The path to the file to characterize.  Can be relative or absolute.
     :type filepath: str
 
-    :param node_iri: The desired full IRI for the node.  If absent, will make an IRI of the pattern ``ns_base + 'file-' + uuid4``
+    :param node_iri: The desired full IRI for the node.  If absent, will make an IRI of the pattern ``ns_base + 'File-' + uuid``
     :type node_iri: str
 
     :param node_prefix: The base prefix to use if node_iri is not supplied.
@@ -85,7 +91,7 @@ def create_file_node(
     node_namespace = rdflib.Namespace(node_prefix)
 
     if node_iri is None:
-        node_slug = "file-" + case_utils.local_uuid.local_uuid()
+        node_slug = "File-" + case_utils.local_uuid.local_uuid()
         node_iri = node_namespace[node_slug]
     n_file = rdflib.URIRef(node_iri)
     graph.add((n_file, NS_RDF.type, NS_UCO_OBSERVABLE.File))
@@ -94,7 +100,15 @@ def create_file_node(
     literal_basename = rdflib.Literal(basename)
 
     file_stat = os.stat(filepath)
-    n_file_facet = node_namespace["file-facet-" + case_utils.local_uuid.local_uuid()]
+
+    n_file_facet: rdflib.URIRef
+    if use_deterministic_uuids:
+        n_file_facet = case_utils.inherent_uuid.get_facet_uriref(
+            n_file, NS_UCO_OBSERVABLE.FileFacet, namespace=node_namespace
+        )
+    else:
+        n_file_facet = node_namespace["FileFacet-" + case_utils.local_uuid.local_uuid()]
+
     graph.add(
         (
             n_file_facet,
@@ -121,9 +135,16 @@ def create_file_node(
         graph.add((n_file_facet, NS_UCO_OBSERVABLE.modifiedTime, literal_mtime))
 
     if not disable_hashes:
-        n_contentdata_facet = node_namespace[
-            "content-data-facet-" + case_utils.local_uuid.local_uuid()
-        ]
+        n_contentdata_facet: rdflib.URIRef
+        if use_deterministic_uuids:
+            n_contentdata_facet = case_utils.inherent_uuid.get_facet_uriref(
+                n_file, NS_UCO_OBSERVABLE.ContentDataFacet, namespace=node_namespace
+            )
+        else:
+            n_contentdata_facet = node_namespace[
+                "ContentDataFacet-" + case_utils.local_uuid.local_uuid()
+            ]
+
         graph.add((n_file, NS_UCO_CORE.hasFacet, n_contentdata_facet))
         graph.add(
             (n_contentdata_facet, NS_RDF.type, NS_UCO_OBSERVABLE.ContentDataFacet)
@@ -140,6 +161,8 @@ def create_file_node(
             sha1obj = hashlib.sha1()
             sha256obj = hashlib.sha256()
             sha512obj = hashlib.sha512()
+            sha3_256obj = hashlib.sha3_256()
+            sha3_512obj = hashlib.sha3_512()
             stashed_error = None
             byte_tally = 0
             with open(filepath, "rb") as in_fh:
@@ -158,6 +181,8 @@ def create_file_node(
                     sha1obj.update(buf)
                     sha256obj.update(buf)
                     sha512obj.update(buf)
+                    sha3_256obj.update(buf)
+                    sha3_512obj.update(buf)
             if stashed_error is not None:
                 raise stashed_error
             current_hashdict = HashDict(
@@ -166,6 +191,8 @@ def create_file_node(
                 sha1obj.hexdigest(),
                 sha256obj.hexdigest(),
                 sha512obj.hexdigest(),
+                sha3_256obj.hexdigest(),
+                sha3_512obj.hexdigest(),
             )
             if last_hashdict == current_hashdict:
                 successful_hashdict = current_hashdict
@@ -193,26 +220,48 @@ def create_file_node(
 
         # Add confirmed hashes into graph.
         for key in successful_hashdict._fields:
-            if key not in ("md5", "sha1", "sha256", "sha512"):
+            if key not in ("md5", "sha1", "sha256", "sha512", "sha3_256", "sha3_512"):
                 continue
-            n_hash = node_namespace["hash-" + case_utils.local_uuid.local_uuid()]
+
+            l_hash_method: rdflib.Literal
+            if key in ("sha3_256", "sha3_512"):
+                l_hash_method = rdflib.Literal(
+                    key.replace("_", "-").upper(),
+                    datatype=NS_UCO_VOCABULARY.HashNameVocab,
+                )
+            else:
+                l_hash_method = rdflib.Literal(
+                    key.upper(), datatype=NS_UCO_VOCABULARY.HashNameVocab
+                )
+
+            hash_value: str = getattr(successful_hashdict, key)
+            l_hash_value = rdflib.Literal(hash_value.upper(), datatype=NS_XSD.hexBinary)
+
+            hash_uuid: str
+            if use_deterministic_uuids:
+                hash_uuid = str(
+                    case_utils.inherent_uuid.hash_method_value_uuid(
+                        l_hash_method, l_hash_value
+                    )
+                )
+            else:
+                hash_uuid = case_utils.local_uuid.local_uuid()
+            n_hash = node_namespace["Hash-" + hash_uuid]
+
             graph.add((n_contentdata_facet, NS_UCO_OBSERVABLE.hash, n_hash))
             graph.add((n_hash, NS_RDF.type, NS_UCO_TYPES.Hash))
             graph.add(
                 (
                     n_hash,
                     NS_UCO_TYPES.hashMethod,
-                    rdflib.Literal(
-                        key.upper(), datatype=NS_UCO_VOCABULARY.HashNameVocab
-                    ),
+                    l_hash_method,
                 )
             )
-            hash_value = getattr(successful_hashdict, key)
             graph.add(
                 (
                     n_hash,
                     NS_UCO_TYPES.hashValue,
-                    rdflib.Literal(hash_value.upper(), datatype=NS_XSD.hexBinary),
+                    l_hash_value,
                 )
             )
 
@@ -225,6 +274,11 @@ def main() -> None:
     parser.add_argument("--debug", action="store_true")
     parser.add_argument("--disable-hashes", action="store_true")
     parser.add_argument("--disable-mtime", action="store_true")
+    parser.add_argument(
+        "--use-deterministic-uuids",
+        action="store_true",
+        help="Use UUIDs computed using the case_utils.inherent_uuid module.",
+    )
     parser.add_argument(
         "--output-format", help="Override extension-based format guesser."
     )
@@ -257,14 +311,15 @@ def main() -> None:
         context_dictionary = {k: v for (k, v) in graph.namespace_manager.namespaces()}
         serialize_kwargs["context"] = context_dictionary
 
-    node_iri = NS_BASE["file-" + case_utils.local_uuid.local_uuid()]
+    node_iri = NS_BASE["File-" + case_utils.local_uuid.local_uuid()]
     create_file_node(
         graph,
         args.in_file,
         node_iri=node_iri,
         node_prefix=args.base_prefix,
         disable_hashes=args.disable_hashes,
         disable_mtime=args.disable_mtime,
+        use_deterministic_uuids=args.use_deterministic_uuids,
     )
 
     graph.serialize(args.out_graph, **serialize_kwargs)

diff --git a/case_utils/case_sparql_construct/__init__.py b/case_utils/case_sparql_construct/__init__.py
@@ -15,7 +15,7 @@
 This script executes a SPARQL CONSTRUCT query, returning a graph of the generated triples.
 """
 
-__version__ = "0.2.3"
+__version__ = "0.2.5"
 
 import argparse
 import logging
@@ -49,7 +49,7 @@ def main() -> None:
         "--built-version",
         choices=tuple(built_version_choices_list),
         default="case-" + CURRENT_CASE_VERSION,
-        help="Ontology version to use to supplement query, such as for subclass querying.  Does not require networking to use.  Default is most recent CASE release.",
+        help="Ontology version to use to supplement query, such as for subclass querying.  Does not require networking to use.  Default is most recent CASE release.  Passing 'none' will mean no pre-built CASE ontology versions accompanying this tool will be included in the analysis.",
     )
     parser.add_argument(
         "--disallow-empty-results",
@@ -98,10 +98,11 @@ def main() -> None:
     construct_query_result = in_graph.query(construct_query_object)
     _logger.debug("type(construct_query_result) = %r." % type(construct_query_result))
     _logger.debug("len(construct_query_result) = %d." % len(construct_query_result))
-    for (row_no, row) in enumerate(construct_query_result):
+    for row_no, row in enumerate(construct_query_result):
+        assert isinstance(row, tuple)
         if row_no == 0:
             _logger.debug("row[0] = %r." % (row,))
-        out_graph.add(row)
+        out_graph.add((row[0], row[1], row[2]))
 
     output_format = None
     if args.output_format is None: