Skip to content

Commit

Permalink
Merge branch 'develop' into reduce_redundant_analysis
Browse files Browse the repository at this point in the history
  • Loading branch information
ajnelson-nist committed Aug 24, 2023
2 parents ce8da08 + 24bdcc7 commit 63cb72b
Show file tree
Hide file tree
Showing 110 changed files with 44,248 additions and 365 deletions.
10 changes: 10 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# This file lists the contributors responsible for the
# repository content. They will also be automatically
# asked to review any pull request made in this repository.

# Each line is a file pattern followed by one or more owners.
# The sequence matters: later patterns take precedence.

# FILES OWNERS
* @casework/maintainers-global
* @casework/maintainers-case-python-utilities
6 changes: 4 additions & 2 deletions .github/workflows/cicd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ on:
release:
types:
- published
schedule:
- cron: '15 5 * * TUE'

jobs:
build:
Expand All @@ -31,8 +33,8 @@ jobs:
strategy:
matrix:
python-version:
- '3.7'
- '3.10'
- '3.8'
- '3.11'

steps:
- uses: actions/checkout@v2
Expand Down
6 changes: 3 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
repos:
- repo: https://github.com/psf/black
rev: 22.3.0
rev: 23.1.0
hooks:
- id: black
- repo: https://github.com/pycqa/flake8
rev: 4.0.1
rev: 6.0.0
hooks:
- id: flake8
- repo: https://github.com/pycqa/isort
rev: 5.10.1
rev: 5.12.0
hooks:
- id: isort
name: isort (python)
6 changes: 4 additions & 2 deletions CONTRIBUTE.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,12 @@ pushd case_utils/ontology
git add case-0.6.0.ttl # Assuming CASE 0.6.0 was just released.
# and/or
git add uco-0.8.0.ttl # Assuming UCO 0.8.0 was adopted in CASE 0.6.0.

git add ontology_and_version_iris.txt
popd
make check
# Assuming `make check` passes:
git commit -m "Build CASE 0.6.0 monolithic .ttl files" case_utils/ontology/case-0.6.0-subclasses.ttl case_utils/ontology/case-0.6.0.ttl
git commit -m "Build CASE 0.6.0 monolithic .ttl files" case_utils/ontology/case-0.6.0-subclasses.ttl case_utils/ontology/case-0.6.0.ttl case_utils/ontology/ontology_and_version_iris.txt
git commit -m "Update CASE ontology pointer to version 0.6.0" dependencies/CASE case_utils/ontology/version_info.py
```

Expand All @@ -43,4 +45,4 @@ pre-commit --version
The `pre-commit` tool hooks into Git's commit machinery to run a set of linters and static analyzers over each change. To install `pre-commit` into Git's hooks, run:
```bash
pre-commit install
```
```
20 changes: 18 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,12 +55,17 @@ To produce the validation report as a machine-readable graph output, the `--form
case_validate --format turtle input.json > result.ttl
```

To use one or more supplementary ontology files, the `--ontology-graph` flag can be used, more than once if desired, to supplement the selected CASE version:
To use one or more supplementary ontology or shape files, the `--ontology-graph` flag can be used, more than once if desired, to supplement the selected CASE version:

```bash
case_validate --ontology-graph internal_ontology.ttl --ontology-graph experimental_shapes.ttl input.json
case_validate \
--ontology-graph internal_ontology.ttl \
--ontology-graph experimental_shapes.ttl \
input.json
```

This tool uses the `--built-version` flag, described [below](#built-versions).

Other flags are reviewable with `case_validate --help`.


Expand All @@ -87,6 +92,8 @@ These commands can be used with any RDF files to run arbitrary SPARQL queries.

Note that prefixes used in the SPARQL queries do not need to be defined in the SPARQL query. Their mapping will be inherited from their first definition in the input graph files. However, input graphs are not required to agree on prefix mappings, so there is potential for confusion from input argument order mattering if two input graph files disagree on what a prefix maps to. If there is concern of ambiguity from inputs, a `PREFIX` statement should be included in the query, such as is shown in [this test query](tests/case_utils/case_sparql_select/subclass.sparql).

These tools use the `--built-version` flag, described [below](#built-versions).


#### `case_sparql_construct`

Expand Down Expand Up @@ -116,6 +123,15 @@ case_sparql_select output.md input.sparql input.json [input-2.json ...]
This [module](case_utils/local_uuid.py) provides a wrapper UUID generator, `local_uuid()`. Its main purpose is making example data generate consistent identifiers, and intentionally includes mechanisms to make it difficult to activate this mode without awareness of the caller.


### Built versions

Several tools in this package include a flag `--built-version`. This flag tailors the tool's behavior to a certain CASE ontology version; typically, this involves mixing the ontology graph into the data graph for certain necessary knowledge expansion for pattern matching (such as making queries aware of the OWL subclass hierarchy).

If not provided, the tool will assume a default value of the latest ontology version.

If the special value `none` is provided, none of the ontology builds this package ships will be included in the data graph. The `none` value supports use cases that are wholly independent of CASE, such as running a test in a specialized vocabulary; and also suports use cases where a non-released CASE version is meant to be used, such as a locally revised version of CASE where some concept revisions are being reviewed.


## Development status

This repository follows [CASE community guidance on describing development status](https://caseontology.org/resources/software.html#development_status), by adherence to noted support requirements.
Expand Down
2 changes: 1 addition & 1 deletion case_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@
#
# We would appreciate acknowledgement if the software is used.

__version__ = "0.7.0"
__version__ = "0.11.0"

from . import local_uuid # noqa: F401
87 changes: 71 additions & 16 deletions case_utils/case_file/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
This module creates a graph object that provides a basic UCO characterization of a single file. The gathered metadata is among the more "durable" file characteristics, i.e. characteristics that would remain consistent when transferring a file between locations.
"""

__version__ = "0.4.0"
__version__ = "0.5.0"

import argparse
import datetime
Expand All @@ -27,7 +27,7 @@

import rdflib

import case_utils
import case_utils.inherent_uuid
from case_utils.namespace import (
NS_RDF,
NS_UCO_CORE,
Expand All @@ -39,6 +39,7 @@

DEFAULT_PREFIX = "http://example.org/kb/"


# Shortcut syntax for defining an immutable named tuple is noted here:
# https://docs.python.org/3/library/typing.html#typing.NamedTuple
# via the "See also" box here: https://docs.python.org/3/library/collections.html#collections.namedtuple
Expand All @@ -48,6 +49,8 @@ class HashDict(typing.NamedTuple):
sha1: str
sha256: str
sha512: str
sha3_256: str
sha3_512: str


def create_file_node(
Expand All @@ -57,6 +60,9 @@ def create_file_node(
node_prefix: str = DEFAULT_PREFIX,
disable_hashes: bool = False,
disable_mtime: bool = False,
*args: typing.Any,
use_deterministic_uuids: bool = False,
**kwargs: typing.Any,
) -> rdflib.URIRef:
r"""
This function characterizes the file at filepath.
Expand All @@ -67,7 +73,7 @@ def create_file_node(
:param filepath: The path to the file to characterize. Can be relative or absolute.
:type filepath: str
:param node_iri: The desired full IRI for the node. If absent, will make an IRI of the pattern ``ns_base + 'file-' + uuid4``
:param node_iri: The desired full IRI for the node. If absent, will make an IRI of the pattern ``ns_base + 'File-' + uuid``
:type node_iri: str
:param node_prefix: The base prefix to use if node_iri is not supplied.
Expand All @@ -85,7 +91,7 @@ def create_file_node(
node_namespace = rdflib.Namespace(node_prefix)

if node_iri is None:
node_slug = "file-" + case_utils.local_uuid.local_uuid()
node_slug = "File-" + case_utils.local_uuid.local_uuid()
node_iri = node_namespace[node_slug]
n_file = rdflib.URIRef(node_iri)
graph.add((n_file, NS_RDF.type, NS_UCO_OBSERVABLE.File))
Expand All @@ -94,7 +100,15 @@ def create_file_node(
literal_basename = rdflib.Literal(basename)

file_stat = os.stat(filepath)
n_file_facet = node_namespace["file-facet-" + case_utils.local_uuid.local_uuid()]

n_file_facet: rdflib.URIRef
if use_deterministic_uuids:
n_file_facet = case_utils.inherent_uuid.get_facet_uriref(
n_file, NS_UCO_OBSERVABLE.FileFacet, namespace=node_namespace
)
else:
n_file_facet = node_namespace["FileFacet-" + case_utils.local_uuid.local_uuid()]

graph.add(
(
n_file_facet,
Expand All @@ -121,9 +135,16 @@ def create_file_node(
graph.add((n_file_facet, NS_UCO_OBSERVABLE.modifiedTime, literal_mtime))

if not disable_hashes:
n_contentdata_facet = node_namespace[
"content-data-facet-" + case_utils.local_uuid.local_uuid()
]
n_contentdata_facet: rdflib.URIRef
if use_deterministic_uuids:
n_contentdata_facet = case_utils.inherent_uuid.get_facet_uriref(
n_file, NS_UCO_OBSERVABLE.ContentDataFacet, namespace=node_namespace
)
else:
n_contentdata_facet = node_namespace[
"ContentDataFacet-" + case_utils.local_uuid.local_uuid()
]

graph.add((n_file, NS_UCO_CORE.hasFacet, n_contentdata_facet))
graph.add(
(n_contentdata_facet, NS_RDF.type, NS_UCO_OBSERVABLE.ContentDataFacet)
Expand All @@ -140,6 +161,8 @@ def create_file_node(
sha1obj = hashlib.sha1()
sha256obj = hashlib.sha256()
sha512obj = hashlib.sha512()
sha3_256obj = hashlib.sha3_256()
sha3_512obj = hashlib.sha3_512()
stashed_error = None
byte_tally = 0
with open(filepath, "rb") as in_fh:
Expand All @@ -158,6 +181,8 @@ def create_file_node(
sha1obj.update(buf)
sha256obj.update(buf)
sha512obj.update(buf)
sha3_256obj.update(buf)
sha3_512obj.update(buf)
if stashed_error is not None:
raise stashed_error
current_hashdict = HashDict(
Expand All @@ -166,6 +191,8 @@ def create_file_node(
sha1obj.hexdigest(),
sha256obj.hexdigest(),
sha512obj.hexdigest(),
sha3_256obj.hexdigest(),
sha3_512obj.hexdigest(),
)
if last_hashdict == current_hashdict:
successful_hashdict = current_hashdict
Expand Down Expand Up @@ -193,26 +220,48 @@ def create_file_node(

# Add confirmed hashes into graph.
for key in successful_hashdict._fields:
if key not in ("md5", "sha1", "sha256", "sha512"):
if key not in ("md5", "sha1", "sha256", "sha512", "sha3_256", "sha3_512"):
continue
n_hash = node_namespace["hash-" + case_utils.local_uuid.local_uuid()]

l_hash_method: rdflib.Literal
if key in ("sha3_256", "sha3_512"):
l_hash_method = rdflib.Literal(
key.replace("_", "-").upper(),
datatype=NS_UCO_VOCABULARY.HashNameVocab,
)
else:
l_hash_method = rdflib.Literal(
key.upper(), datatype=NS_UCO_VOCABULARY.HashNameVocab
)

hash_value: str = getattr(successful_hashdict, key)
l_hash_value = rdflib.Literal(hash_value.upper(), datatype=NS_XSD.hexBinary)

hash_uuid: str
if use_deterministic_uuids:
hash_uuid = str(
case_utils.inherent_uuid.hash_method_value_uuid(
l_hash_method, l_hash_value
)
)
else:
hash_uuid = case_utils.local_uuid.local_uuid()
n_hash = node_namespace["Hash-" + hash_uuid]

graph.add((n_contentdata_facet, NS_UCO_OBSERVABLE.hash, n_hash))
graph.add((n_hash, NS_RDF.type, NS_UCO_TYPES.Hash))
graph.add(
(
n_hash,
NS_UCO_TYPES.hashMethod,
rdflib.Literal(
key.upper(), datatype=NS_UCO_VOCABULARY.HashNameVocab
),
l_hash_method,
)
)
hash_value = getattr(successful_hashdict, key)
graph.add(
(
n_hash,
NS_UCO_TYPES.hashValue,
rdflib.Literal(hash_value.upper(), datatype=NS_XSD.hexBinary),
l_hash_value,
)
)

Expand All @@ -225,6 +274,11 @@ def main() -> None:
parser.add_argument("--debug", action="store_true")
parser.add_argument("--disable-hashes", action="store_true")
parser.add_argument("--disable-mtime", action="store_true")
parser.add_argument(
"--use-deterministic-uuids",
action="store_true",
help="Use UUIDs computed using the case_utils.inherent_uuid module.",
)
parser.add_argument(
"--output-format", help="Override extension-based format guesser."
)
Expand Down Expand Up @@ -257,14 +311,15 @@ def main() -> None:
context_dictionary = {k: v for (k, v) in graph.namespace_manager.namespaces()}
serialize_kwargs["context"] = context_dictionary

node_iri = NS_BASE["file-" + case_utils.local_uuid.local_uuid()]
node_iri = NS_BASE["File-" + case_utils.local_uuid.local_uuid()]
create_file_node(
graph,
args.in_file,
node_iri=node_iri,
node_prefix=args.base_prefix,
disable_hashes=args.disable_hashes,
disable_mtime=args.disable_mtime,
use_deterministic_uuids=args.use_deterministic_uuids,
)

graph.serialize(args.out_graph, **serialize_kwargs)
Expand Down
9 changes: 5 additions & 4 deletions case_utils/case_sparql_construct/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
This script executes a SPARQL CONSTRUCT query, returning a graph of the generated triples.
"""

__version__ = "0.2.3"
__version__ = "0.2.5"

import argparse
import logging
Expand Down Expand Up @@ -49,7 +49,7 @@ def main() -> None:
"--built-version",
choices=tuple(built_version_choices_list),
default="case-" + CURRENT_CASE_VERSION,
help="Ontology version to use to supplement query, such as for subclass querying. Does not require networking to use. Default is most recent CASE release.",
help="Ontology version to use to supplement query, such as for subclass querying. Does not require networking to use. Default is most recent CASE release. Passing 'none' will mean no pre-built CASE ontology versions accompanying this tool will be included in the analysis.",
)
parser.add_argument(
"--disallow-empty-results",
Expand Down Expand Up @@ -98,10 +98,11 @@ def main() -> None:
construct_query_result = in_graph.query(construct_query_object)
_logger.debug("type(construct_query_result) = %r." % type(construct_query_result))
_logger.debug("len(construct_query_result) = %d." % len(construct_query_result))
for (row_no, row) in enumerate(construct_query_result):
for row_no, row in enumerate(construct_query_result):
assert isinstance(row, tuple)
if row_no == 0:
_logger.debug("row[0] = %r." % (row,))
out_graph.add(row)
out_graph.add((row[0], row[1], row[2]))

output_format = None
if args.output_format is None:
Expand Down
Loading

0 comments on commit 63cb72b

Please sign in to comment.