Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON-LD context for serialized UCO content #423

Open
4 of 11 tasks
sbarnum opened this issue Aug 2, 2022 · 10 comments · May be fixed by #453
Open
4 of 11 tasks

Add JSON-LD context for serialized UCO content #423

sbarnum opened this issue Aug 2, 2022 · 10 comments · May be fixed by #453
Assignees
Milestone

Comments

@sbarnum
Copy link
Contributor

sbarnum commented Aug 2, 2022

Background

The chosen default serialization for UCO content is json-ld. json-ld spec

json-ld is an officially supported serialization within the RDF ecosystem and is losslessly transformable by rdf tools to other rdf serializations.

The default fully expanded form of json-ld is referred to as "expanded" and contains full IRIs for all class types and properties.
Throughout this CP the "Device" example within the CASE examples repo will be utilized for illustrative purposes.
CASE Device example.

In expanded json-ld form this example would look like this:

{
    "@graph": [
        {
            "@id": "http://example.org/kb/organization-c240cf37-0556-439b-9a51-1ca41732010d",
            "@type": "https://ontology.unifiedcyberontology.org/uco/identity/Organization",
            "https://ontology.unifiedcyberontology.org/uco/core/name": "Dell"
        },
        {
            "@id": "http://example.org/kb/organization-cc0e0667-eadf-4b2e-9618-3f62b1bdae26",
            "@type": "https://ontology.unifiedcyberontology.org/uco/identity/Organization",
            "https://ontology.unifiedcyberontology.org/uco/core/name": "Microsoft"
        },
        {
            "@id": "http://example.org/kb/forensic_lab_computer1-uuid",
            "@type": "https://ontology.unifiedcyberontology.org/uco/observable/Device",
            "http://example.org/local#location": {
                "@id": "http://example.org/kb/forensic_lab1-uuid"
            },
            "https://ontology.unifiedcyberontology.org/uco/core/hasFacet": [
                {
                    "@type": "https://ontology.unifiedcyberontology.org/uco/observable/DeviceFacet",
                    "https://ontology.unifiedcyberontology.org/uco/observable/deviceType": "Computer",
                    "https://ontology.unifiedcyberontology.org/uco/observable/manufacturer": {
                        "@id": "http://example.org/kb/organization-c240cf37-0556-439b-9a51-1ca41732010d"
                    },
                    "https://ontology.unifiedcyberontology.org/uco/observable/model": "Inspiron 5000",
                    "https://ontology.unifiedcyberontology.org/uco/observable/serialNumber": "D1234567"
                },
                {
                    "@type": "https://ontology.unifiedcyberontology.org/uco/observable/OperatingSystemFacet",
                    "https://ontology.unifiedcyberontology.org/uco/core/name": "Windows 7 Ultimate Edition",
                    "https://ontology.unifiedcyberontology.org/uco/observable/manufacturer": {
                        "@id": "http://example.org/kb/organization-cc0e0667-eadf-4b2e-9618-3f62b1bdae26"
                    },
                    "https://ontology.unifiedcyberontology.org/uco/observable/version": "6.1.7601 Service Pack 1 Build 7601",
                    "https://ontology.unifiedcyberontology.org/uco/observable/installDate": {
                        "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
                        "@value": "2019-07-10T16:33:42Z"
                    }
                },
                {
                    "@type": "https://ontology.unifiedcyberontology.org/uco/observable/ComputerSpecificationFacet",
                    "https://ontology.unifiedcyberontology.org/uco/observable/biosVersion": "E1762IMS.10M",
                    "https://ontology.unifiedcyberontology.org/uco/observable/cpuFamily": "Intel Pentium i7",
                    "https://ontology.unifiedcyberontology.org/uco/observable/totalRam": 4294967296
                },
                {
                    "@type": "https://ontology.unifiedcyberontology.org/uco/observable/DomainNameFacet",
                    "https://ontology.unifiedcyberontology.org/uco/observable/value": "dfl.local",
                    "https://ontology.unifiedcyberontology.org/uco/observable/isTLD": false
                },
                {
                    "@type": "https://ontology.unifiedcyberontology.org/uco/observable/IPv4AddressFacet",
                    "https://ontology.unifiedcyberontology.org/uco/observable/addressValue": "192.168.1.145"
                },
                {
                    "@type": [
                        "http://example.org/kb/InventoryComputerFacet",
                        "https://ontology.unifiedcyberontology.org/uco/core/Facet"
                    ],
                    "http://example.org/kb/name": "DFL-03",
                    "http://example.org/kb/inventoryNumber": "10503"
                }
            ]
        }
    ]
}

JSON-LD provides a mechanism called a "context" that allows specification of particular details that allow a related body of json-ld content to be compacted to a more concise form. It also supports any amount of lossless compaction and expansion.

The Device example as it exists in the CASE examples repo currently applies a simple json-ld context to avoid having to repeatedly express IRI path detail for every object in the content.
Using this simple level of compaction yields the example in this form:

{
    "@context": {
        "@vocab": "http://example.org/local#",
        "kb": "http://example.org/kb/",
        "acme": "http://custompb.acme.org/core#",
        "draft": "http://example.org/draft#",
        "uco-core": "https://ontology.unifiedcyberontology.org/uco/core/",
        "uco-identity": "https://ontology.unifiedcyberontology.org/uco/identity/",
        "uco-location": "https://ontology.unifiedcyberontology.org/uco/location/",
        "uco-observable": "https://ontology.unifiedcyberontology.org/uco/observable/",
        "xsd": "http://www.w3.org/2001/XMLSchema#"
    },
    "@graph": [
        {
            "@id": "kb:organization-c240cf37-0556-439b-9a51-1ca41732010d",
            "@type": "uco-identity:Organization",
            "uco-core:name": "Dell"
        },
        {
            "@id": "kb:organization-cc0e0667-eadf-4b2e-9618-3f62b1bdae26",
            "@type": "uco-identity:Organization",
            "uco-core:name": "Microsoft"
        },
        {
            "@id": "kb:forensic_lab_computer1-uuid",
            "@type": "uco-observable:Device",
            "location": {
                "@id": "kb:forensic_lab1-uuid"
            },
            "uco-core:hasFacet": [
                {
                    "@type": "uco-observable:DeviceFacet",
                    "uco-observable:deviceType": "Computer",
                    "uco-observable:manufacturer": {
                        "@id": "kb:organization-c240cf37-0556-439b-9a51-1ca41732010d"
                    },
                    "uco-observable:model": "Inspiron 5000",
                    "uco-observable:serialNumber": "D1234567"
                },
                {
                    "@type": "uco-observable:OperatingSystemFacet",
                    "uco-core:name": "Windows 7 Ultimate Edition",
                    "uco-observable:manufacturer": {
                        "@id": "kb:organization-cc0e0667-eadf-4b2e-9618-3f62b1bdae26"
                    },
                    "uco-observable:version": "6.1.7601 Service Pack 1 Build 7601",
                    "uco-observable:installDate": {
                        "@type": "xsd:dateTime",
                        "@value": "2019-07-10T16:33:42Z"
                    }
                },
                {
                    "@type": "uco-observable:ComputerSpecificationFacet",
                    "uco-observable:biosVersion": "E1762IMS.10M",
                    "uco-observable:cpuFamily": "Intel Pentium i7",
                    "uco-observable:totalRam": 4294967296
                },
                {
                    "@type": "uco-observable:DomainNameFacet",
                    "uco-observable:value": "dfl.local",
                    "uco-observable:isTLD": false
                },
                {
                    "@type": "uco-observable:IPv4AddressFacet",
                    "uco-observable:addressValue": "192.168.1.145"
                },
                {
                    "@type": [
                        "acme:InventoryComputerFacet",
                        "uco-core:Facet"
                    ],
                    "acme:name": "DFL-03",
                    "acme:inventoryNumber": "10503"
                }
            ]
        }
    ]
}

JSON-LD contexts can also support things like

  • compaction of type assertions for properties such that the properties may be used on the body of content with just their value and not duplicative types assertions on each use
  • specification that particular properties should be represented as containers (sets, ordered lists, etc)
  • compaction of strings within the body of content

This example specifies the context inline with the other json-ld body of content in the file and is limited to only the prefixes used for the content in the file.

JSON-LD supports the specification of context inline, as a separate file referenced from the json-ld content file, or potentially a combination of both.

For consistent and more concise use of serialized UCO json-ld content by the adopting community, a full json-ld context is needed for each version of UCO that is available for remote online reference or for local deployment and reference by json-ld serialized content.

Requirements

Requirement 1

json-ld context to support compaction of all IRI base paths through defined prefixes

Requirement 2

json-ld context to support compaction of all property type assertions

Requirement 3

json-ld context to support assertion of properties with potential cardinalities >1 as set arrrays

Requirement 4

json-ld context to support compaction of json-ld specific key strings @id, @type, @value and @graph to simple json key strings id, type, value, and graph such that the body of content can be viewed as simple json and the context can be utilized to expand it into fully codified json-ld

Requirement 5

json-ld context to support compaction of class type names and property names to prefixless names where possible (where the base names without prefixes are uniquely defined in UCO). For any base name defined in UCO that is non-unique when prefixes are removed or for any custom (not defined in UCO) class types or properties defined by the content producer, the prefixed name would be used.

This requirement is only necessary for the "concise" version of the json-ld context.

Requirement 6

Ability to autogenerate full json-ld context for each UCO release

Requirement 7

Ability to autogenerate full json-ld context for any interim UCO version

Requirement 8

Ability to publish json-ld context for each UCO release online such that produced UCO content can effectively reference and use it for json-ld processing

Requirement 9

Ability for a producer to pull down an online published json-ld context and utilize it locally with their defined content

Requirement 10

Ability for UCO content producer to specify both a reference to a remote official json-ld context for UCO and a local inline json-ld context for any custom (not defined in UCO) class types or properties defined by the content producer in their content

Risk / Benefit analysis

Benefits

Consistency of serialized content produced, exchanged and consumed by UCO community adopters.

Smaller and more concise serialized UCO content.

Ability for producers to treat UCO content as simple JSON while yielding the significant benefits of JSON-LD.

Risks

All existing UCO and CASE examples should be updated to utilize the new context and compact form.

Competencies demonstrated

Competency 1

Compaction and expansion of json-ld serialized content

Competency Question 1.1

What is the fully compacted form of a given body of json-ld serialized UCO content?

Result 1.1

The fully compacted and concise form of the json-ld serialized UCO content

Competency Question 1.2

What is the fully expanded form of a given body of json-ld serialized UCO content?

Result 1.2

The fully expanded and verbose form of the json-ld serialized UCO content

Solution suggestion

Implement code to autogenerate two different (minimal and concise) json-ld contexts for any given version of UCO.
Persistently publish online the two json-ld contexts for each official release of UCO.
Temporarily publish somewhere online the two json-ld contexts for each interim version of UCO.

The "minimal" json-ld context would be considered the default and would support Requirements 1 - 4.
Using the scope of the Device example to provide an illustrative example of what such a context would like (the actual full context would contains details for ALL prefixes, and properties in UCO) the context would look something like:

{
    "@context": {
        "uco-core": "https://ontology.unifiedcyberontology.org/uco/core/",
        "uco-identity": "https://ontology.unifiedcyberontology.org/uco/identity/",
        "uco-location": "https://ontology.unifiedcyberontology.org/uco/location/",
        "uco-observable": "https://ontology.unifiedcyberontology.org/uco/observable/",
        "xsd": "http://www.w3.org/2001/XMLSchema#",
        "uco-core:hasFacet": {
          "@type": "@id"
        },
        "uco-observable:manufacturer": {
          "@type": "@id"
        },
        ...
        "uco-core:name": {
          "@type": "xsd:string"
        },
        "uco-observable:deviceType": {
          "@type": "xsd:string"
        },
        "uco-observable:model": {
          "@type": "xsd:string"
        },
        "uco-observable:serialNumber": {
          "@type": "xsd:string"
        },
        "uco-observable:version": {
          "@type": "xsd:string"
        },
        "uco-observable:installDate": {
          "@type": "xsd:dateTime"
        },
        "uco-observable:biosVersion": {
          "@type": "xsd:string"
        },
        "uco-observable:cpuFamily": {
          "@type": "xsd:string"
        },
        "uco-observable:totalRam": {
          "@type": "xsd:integer"
        },
        "uco-observable:value": {
          "@type": "xsd:string"
        },
        "uco-observable:isTLD": {
          "@type": "xsd:boolean"
        },
        "uco-observable:addressValue": {
          "@type": "xsd:string"
        },
        "id": "@id",
        "type": "@type",
        "graph": "@graph"
    }

Utilizing this context combined with a local in-line defined context for the custom (non-UCO defined content in the body content), the Device example content would look like this:

{
    "@context": [
      "https://ontology.unifiedcyberontology.org/uco/uco-ld-context-minimal.json",
      {
        "@vocab": "http://example.org/local#",
        "kb": "http://example.org/kb/",
        "acme": "http://custompb.acme.org/core#",
        "draft": "http://example.org/draft#"
      }
    ],
    "graph": [
        {
            "id": "kb:organization-c240cf37-0556-439b-9a51-1ca41732010d",
            "type": "uco-identity:Organization",
            "uco-core:name": "Dell"
        },
        {
            "id": "kb:organization-cc0e0667-eadf-4b2e-9618-3f62b1bdae26",
            "type": "uco-identity:Organization",
            "uco-core:name": "Microsoft"
        },
        {
            "id": "kb:forensic_lab_computer1-uuid",
            "type": "uco-observable:Device",
            "uco-core:hasFacet": [
                {
                    "type": "uco-observable:DeviceFacet",
                    "uco-observable:deviceType": "Computer",
                    "uco-observable:manufacturer": "kb:organization-c240cf37-0556-439b-9a51-1ca41732010d",
                    "uco-observable:model": "Inspiron 5000",
                    "uco-observable:serialNumber": "D1234567"
                },
                {
                    "type": "uco-observable:OperatingSystemFacet",
                    "uco-core:name": "Windows 7 Ultimate Edition",
                    "uco-observable:manufacturer": "kb:organization-cc0e0667-eadf-4b2e-9618-3f62b1bdae26",
                    "uco-observable:version": "6.1.7601 Service Pack 1 Build 7601",
                    "uco-observable:installDate": "2019-07-10T16:33:42Z"
                },
                {
                    "type": "uco-observable:ComputerSpecificationFacet",
                    "uco-observable:biosVersion": "E1762IMS.10M",
                    "uco-observable:cpuFamily": "Intel Pentium i7",
                    "uco-observable:totalRam": 4294967296
                },
                {
                    "type": "uco-observable:DomainNameFacet",
                    "uco-observable:value": "dfl.local",
                    "uco-observable:isTLD": false
                },
                {
                    "type": "uco-observable:IPv4AddressFacet",
                    "uco-observable:addressValue": "192.168.1.145"
                },
                {
                    "type": [
                        "acme:InventoryComputerFacet",
                        "Facet"
                    ],
                    "acme:name": "DFL-03",
                    "acme:inventoryNumber": "10503"
                }
            ]
        }
    ]
}

The "minimal" json-ld context could be created by a coding implementation of the following pseudo-code:

image

The "concise" json-ld context would be considered optional for those who desire a very concise form and would support Requirements 1 - 5.
Using the scope of the Device example to provide an illustrative example of what such a context would like (the actual full context would contains details for ALL prefixes, and properties in UCO) the context would look something like:

{
    "@context": {
        "uco-core": "https://ontology.unifiedcyberontology.org/uco/core/",
        "uco-identity": "https://ontology.unifiedcyberontology.org/uco/identity/",
        "uco-location": "https://ontology.unifiedcyberontology.org/uco/location/",
        "uco-observable": "https://ontology.unifiedcyberontology.org/uco/observable/",
        "xsd": "http://www.w3.org/2001/XMLSchema#",
        ...
        "ComputerSpecificationFacet": "uco-observable:ComputerSpecificationFacet",
        "Device": "uco-observable:Device",
        "DeviceFacet": "uco-observable:DeviceFacet",
        "DomainNameFacet": "uco-observable:DomainNameFacet",
        "Facet": "uco-core:Facet",
        "InventoryComputerFacet": "acme:InventoryComputerFacet",
        "IPv4AddressFacet": "uco-observable:IPv4AddressFacet",
        "OperatingSystemFacet": "uco-observable:OperatingSystemFacet",
        "Organization": "uco-identity:Organization",
        ...
        "hasFacet": {
          "@id": "uco-core:hasFacet",
          "@type": "@id"
        },
        "manufacturer": {
          "@id": "uco-observable:manufacturer",
          "@type": "@id"
        },
        ...
        "name": {
          "@id": "uco-core:name",
          "@type": "xsd:string"
        },
        "deviceType": {
          "@id": "uco-observable:deviceType",
          "@type": "xsd:string"
        },
        "model": {
          "@id": "uco-observable:model",
          "@type": "xsd:string"
        },
        "serialNumber": {
          "@id": "uco-observable:serialNumber",
          "@type": "xsd:string"
        },
        "version": {
          "@id": "uco-observable:version",
          "@type": "xsd:string"
        },
        "installDate": {
          "@id": "uco-observable:installDate",
          "@type": "xsd:dateTime"
        },
        "biosVersion": {
          "@id": "uco-observable:biosVersion",
          "@type": "xsd:string"
        },
        "cpuFamily": {
          "@id": "uco-observable:cpuFamily",
          "@type": "xsd:string"
        },
        "totalRam": {
          "@id": "uco-observable:totalRam",
          "@type": "xsd:integer"
        },
        "uco-observable:value": {
          "@type": "xsd:string"
        },
        "isTLD": {
          "@id": "uco-observable:isTLD",
          "@type": "xsd:boolean"
        },
        "addressValue": {
          "@id": "uco-observable:addressValue",
          "@type": "xsd:string"
        },
        "id": "@id",
        "type": "@type",
        "graph": "@graph"
    }
}

Utilizing this context combined with a local in-line defined context for the custom (non-UCO defined content in the body content), the Device example content would look like this:

{
    "@context": [
      "https://ontology.unifiedcyberontology.org/uco/uco-ld-context-concise.json",
      {
        "@vocab": "http://example.org/local#",
        "kb": "http://example.org/kb/",
        "acme": "http://custompb.acme.org/core#",
        "draft": "http://example.org/draft#"
      }
    ],
    "graph": [
        {
            "id": "kb:organization-c240cf37-0556-439b-9a51-1ca41732010d",
            "type": "Organization",
            "name": "Dell"
        },
        {
            "id": "kb:organization-cc0e0667-eadf-4b2e-9618-3f62b1bdae26",
            "type": "Organization",
            "name": "Microsoft"
        },
        {
            "id": "kb:forensic_lab_computer1-uuid",
            "type": "Device",
            "hasFacet": [
                {
                    "type": "DeviceFacet",
                    "deviceType": "Computer",
                    "manufacturer": "kb:organization-c240cf37-0556-439b-9a51-1ca41732010d",
                    "model": "Inspiron 5000",
                    "serialNumber": "D1234567"
                },
                {
                    "type": "OperatingSystemFacet",
                    "name": "Windows 7 Ultimate Edition",
                    "manufacturer": "kb:organization-cc0e0667-eadf-4b2e-9618-3f62b1bdae26",
                    "version": "6.1.7601 Service Pack 1 Build 7601",
                    "installDate": "2019-07-10T16:33:42Z"
                },
                {
                    "type": "ComputerSpecificationFacet",
                    "biosVersion": "E1762IMS.10M",
                    "cpuFamily": "Intel Pentium i7",
                    "totalRam": 4294967296
                },
                {
                    "type": "DomainNameFacet",
                    "uco-observable:value": "dfl.local",
                    "isTLD": false
                },
                {
                    "type": "IPv4AddressFacet",
                    "addressValue": "192.168.1.145"
                },
                {
                    "type": [
                        "acme:InventoryComputerFacet",
                        "Facet"
                    ],
                    "acme:name": "DFL-03",
                    "acme:inventoryNumber": "10503"
                }
            ]
        }
    ]
}

The "concise" json-ld context could be created by a coding implementation of the following pseudo-code:
image

Coordination

  • Tracking in Jira ticket ONT-306
  • Administrative review completed, proposal announced to Ontology Committees (OCs) on 2022-08-03
  • Requirements to be discussed in OC meeting, 2022-08-09
  • Requirements Review vote occurred, passing, on 2022-08-09
  • Requirements development phase completed.
  • Solution announced to OCs on (TODO-date)
  • Solutions Approval to be discussed in OC meeting, date TBD
  • Solutions Approval vote has not occurred
  • Solutions development phase completed.
  • Implementation has not been merged into develop
  • Milestone linked
  • Documentation logged in pending release page
@ajnelson-nist
Copy link
Contributor

Sectioned response.

Requirement 2

I don't understand this requirement. Can a snippet please be provided to illustrate what you mean?

Requirement 4

Should @value also be aliased?

Requirement 5

I oppose Requirement 5. Quoting:

json-ld context to support compaction of class type names and property names to prefixless names where possible (where the base names without prefixes are uniquely defined in UCO). For any base name defined in UCO that is non-unique when prefixes are removed or for any custom (not defined in UCO) class types or properties defined by the content producer, the prefixed name would be used.

I believe the risk of typos causing lost graph data is too high.

Example: If we compact observable:fileSize to the string fileSize, and likewise with FileFacet (which is possible with JSON-LD contexts), JSON-LD would look like this and be consummable by a graph engine:

...
{
  "type": "FileFacet",
  "fileSize": 1234
}
...

That would read fine in a graph engine, and could be re-serialized by translation to, e.g., Turtle:

[
   a observable:FileFacet ;
   observable:fileSize 1234 ;
]

However, if there was a typo in the JSON-LD ...

...
{
  "type": "FileFacet",
  "fileSiz": 1234
}
...

...the string fileSiz would be treated as the predicate of a triple, which is not valid RDF 1.1. (This is apparently valid N-Triples, but UCO does not use N-Triples.)

Error handling is inconsistent across graph-consuming engines. RDFLib would silently drop the triple, meaning this is all an engine would be able to see:

[
   a observable:FileFacet ;
]

Some engines would raise a runtime error or somehow squawk otherwise because of a string in a place that RDF (1.1) and OWL do not permit.

I believe the risk of removing prefixes is unacceptable from typo effects alone. This doesn't even get into the conflict from us possibly desiring @value being aliased as value (a tack-on to requirement 4), which would conflict with both UCO's observable:value and types:value.

For those unaware, in UCO's and CASE's prototype days, all of the example JSON-LD tried to assume the pattern of fully compacted strings like requirement 5 imposes. None of it was consumable as RDF, so attempts on graph-based analysis failed. None of it was consumable due to not having the context dictionary, but also due to inlining new concept definitions, which were outside of the context dictionary's known prefix-set and thus would be silently dropped regardless.

I strongly suggest requirement 5 be dropped.

Requirement 7

A comment for my clarification: My understanding of "interim" is what I've called "pre-release" elsewhere - e.g. the CASE website serves the current development state and an "Unstable" state.

@ajnelson-nist
Copy link
Contributor

Requirement 2 has been clarified for me. An example would be a xsd:dateTime, which currently needs to be represented like this:

{
    "@type": "xsd:dateTime",
    "@value": "2020-01-02T03:04:05Z"
}

would be compacted like this:

"2020-01-02T03:04:05Z"

@ajnelson-nist
Copy link
Contributor

Further on Requirement 4: Sean is adding the @value aliasing to value, after discussion on @kchason 's recent PR on CASE-Examples adding a database record.

The databaseFieldValue (under proposal) could potentially be a string, integer, or binary (however we decide to represent binary). So, we have a need in some cases to keep with the nesting JSON-dictionary practice, when we need to be specific among literal-type options.

@ajnelson-nist
Copy link
Contributor

@kfairbanks has picked up the Python steps.

kfairbanks added a commit to kfairbanks/UCO that referenced this issue Aug 5, 2022
kfairbanks added a commit to kfairbanks/UCO that referenced this issue Aug 5, 2022
kfairbanks added a commit to kfairbanks/UCO that referenced this issue Aug 8, 2022
kfairbanks added a commit to kfairbanks/UCO that referenced this issue Aug 9, 2022
ajnelson-nist added a commit to ajnelson-nist/UCO that referenced this issue Aug 11, 2022
@kfairbanks kfairbanks linked a pull request Aug 16, 2022 that will close this issue
11 tasks
@ajnelson-nist ajnelson-nist added this to the UCO 1.0.0 milestone Aug 17, 2022
@ajnelson-nist ajnelson-nist linked a pull request Aug 17, 2022 that will close this issue
11 tasks
ajnelson-nist added a commit to kfairbanks/UCO that referenced this issue Aug 18, 2022
This is the result of working through a few rounds of `mypy --strict`,
which now passes when rdflib >= 6.2.0 is installed in the virtual
environment.

Some minor logic errors were caught.  At least one significant error was
found and flagged.

References:
* ucoProject#423

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to kfairbanks/UCO that referenced this issue Aug 18, 2022
ajnelson-nist added a commit to kfairbanks/UCO that referenced this issue Aug 18, 2022
ajnelson-nist added a commit to kfairbanks/UCO that referenced this issue Aug 18, 2022
Test known to fail currently.

References:
* ucoProject#423

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to kfairbanks/UCO that referenced this issue Aug 18, 2022
This is another necessary proof of functionality for the testing.

References:
* ucoProject#423

Signed-off-by: Alex Nelson <[email protected]>
ajnelson-nist added a commit to kfairbanks/UCO that referenced this issue Aug 18, 2022
@ajnelson-nist
Copy link
Contributor

The implementation has reached a point where it needs some committee feedback on two matters.

UX

First is a user-experience question. This directory has a few .json files that will demonstrate functionality of the context dictionary:

https://github.com/kfairbanks/UCO/tree/issue_423/tests/context_builder

hash_expanded.json intentionally eschews all context niceties, and is needed for a full parse test, so please consider that out of scope of this request: Do the (other) .json files in that directory fit our expected user needs?

I realized we may have a trickier bit of testing to do in order to show how a user would have no @context dictionary period. Each currently inlines a @context dictionary to define the kb: prefix, but otherwise wholly relies on a file that is not referenced in the .json file, and instead loaded at parse-time in any consuming program.

Do we require, as part of the initial release of these dictionaries, a demonstration of truly having no @ JSON object keys? If so, the demonstration needs to include a second, organization-specific context dictionary in order to handle the kb: prefix that would be inappropriate for UCO to specify.

Authoritative ontology prefixes

...As I was drafting this, I hit a piece of the SHACL specification that turns this into a somewhat risky proposal. Filing separately.

@kfairbanks , I suggest you keep on doing what you're doing to declare the uco-* namespace prefixes.

@ajnelson-nist
Copy link
Contributor

Let's anchor the authoritative prefixes question on Issue 457.

@sbarnum
Copy link
Contributor Author

sbarnum commented Aug 18, 2022

It is standard practice for content files to reference the ontology context as a remote reference.
It is unclear why the examples are not doing that. Is there a reason I am unaware of? It is very simple and enables full expansion/compaction of the serialized content. Then if there is need for a local context (e.g., dealing with the kb prefix) it can simply be added as an additional context in the file.
Basically something like this:

"@context": [
      "https://ontology.unifiedcyberontology.org/uco/uco-ld-context-concise.json",
      {
        "kb": "http://example.org/kb/"
      }
    ],

Of course, if the user is operating offline the remote reference could be to a copy of the context file stored in the local filesystem as well.

I would propose that this is the approach we should strongly suggest and the form our examples should take.

If a user truly wanted a json serialization with NO @ in it, meaning no @context, then they would either need to apply the context themselves outside of the content or, if served via the web, the reference to the remote context can be conveyed in the HTTP header leaving the json content "unpolluted". I do not believe we should worry about this case for now.

I am unclear on why the action_result* examples specify the full class hierarchy for the type on each object when this is unnecessary. All you need to do is specify the most appropriate class/type. This seems like adding unnecessary verbosity to the serialization and is likely to be interpreted by users as the way they also need to do it.

Otherwise, I think the forms here look good. I am a fan of the concise version.

@ajnelson-nist
Copy link
Contributor

Re:

It is standard practice for content files to reference the ontology context as a remote reference.
It is unclear why the examples are not doing that. Is there a reason I am unaware of?

The reason is that the ontology's CI testing must use the version of the ontology at a specific, not-necessarily-publicly-posted Git commit. With offline Git development, that can't be done with an assumption of a web resource. Well, it might be possible with monkeypatching, but that would be going deep into rdflib internals with UCO's current CI infrastructure, and that feels too fragile to think about further. Hence, these snippets will be working with local files.

This is also going to be a non-trivial change for CASE-Examples and the CASE website, accommodating prerelease versions / "Nightlies" of the ontology when the context dictionary reference is hard-coded in the JSON-LD file. I suppose sed can do a URL find/replace and swap in file://../../.... But, we haven't gotten to testing that yet.

@ajnelson-nist
Copy link
Contributor

Re:

I am unclear on why the action_result* examples specify the full class hierarchy for the type on each object when this is unnecessary.

That's historic. The original version of that test was to trigger a weird issue with ...I think it was pyshacl, where a hard-coded assumption was in place that class hierarchies would never be more than 3 subclasses deep. The issue was identified as we were releasing 0.7.0, and fixed upstream soon after.

@kfairbanks copied and adapted that test; I dunno why, but sort order might've been it. We could probably swap another example in, in order to do testing of other context dictionary features. I chose the hash example to see how well datatypes work.

ajnelson-nist added a commit to kfairbanks/UCO that referenced this issue Aug 23, 2022
…dictionary

I had previously realized, and forgotten, that datatyped literals do not
appear to be supported as a feature within context dictionaries.

References:
* ucoProject#423

Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist ajnelson-nist modified the milestones: UCO 1.0.0, UCO 1.x.0 Aug 25, 2022
@ajnelson-nist
Copy link
Contributor

Further on the "Concise" discussion point:

The IANA registry entry for application/ld+json includes a "profile" parameter, which includes IRIs such as http://www.w3.org/ns/json-ld#compacted, and #expanded, and #flattened. Those even turn out to be IRIs, a feature apparently posted in 2020 that we'd missed in our original planning from before 2020.

This should inform us well of what JSON-LD files to generate. We don't have to guess at standard practices - the forms to support have already progressed to IANA registration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants