Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request JSON-LD from Link rel=alternate #129

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alpha-beta-soup
Copy link

@alpha-beta-soup alpha-beta-soup commented Jun 24, 2020

Ref #128

With the following test cases defined:

context.jsonld

{
  "@context": {
    "@vocab":   "https://w3c.github.io/json-ld-api/tests/vocab#",
    "dcterms":       "http://purl.org/dc/terms/",
    "jld":      "https://w3c.github.io/json-ld-api/tests/vocab#",
    "mf":       "http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#",
    "rdfs":     "http://www.w3.org/2000/01/rdf-schema#",
    "xsd":      "http://www.w3.org/2001/XMLSchema#",

    "context":         { "@type": "@id" },
    "expect":          { "@id": "mf:result", "@type": "@id" },
    "expectErrorCode": { "@id": "mf:result" },
    "frame":           { "@type": "@id" },
    "input":           { "@id": "mf:action", "@type": "@id" },
    "option":          { "@type": "@id"},
    "sequence":        { "@id": "mf:entries", "@type": "@id", "@container": "@list" },
    "redirectTo":      { "@type": "@id"},

    "name":                 "mf:name",
    "purpose":              "rdfs:comment",
    "description":          "rdfs:comment",
    "base":                 { "@type": "@id" },
    "compactArrays":        { "@type": "xsd:boolean" },
    "compactToRelative":    { "@type": "xsd:boolean" },
    "contentType":          { "@type": "xsd:string" },
    "expandContext":        { "@type": "@id" },
    "extractAllScripts":    { "@type": "xsd:boolean" },
    "httpLink":             { "@type": "xsd:string", "@container": "@set" },
    "httpStatus":           { "@type": "xsd:integer" },
    "normative":            { "@type": "xsd:boolean" },
    "processingMode":       { "@type": "xsd:string" },
    "processorFeature":     { "@type": "xsd:string" },
    "produceGeneralizedRdf":{ "@type": "xsd:boolean" },
    "specVersion":          { "@type": "xsd:string" },
    "useNativeTypes":       { "@type": "xsd:boolean" }
  }
}

manifest.jsonld

{
  "@context": ["context.jsonld", {"@base": "manifest"}],
  "@id": "",
  "@type": "mf:Manifest",
  "name": "JSON-LD Test Suite",
  "description": "This manifest loads some tests for resolving https://github.com/digitalbazaar/pyld/issues/128",
  "sequence": [{
  	"@id": "#t1",
  	"@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
  	"name": "Test for JSON-LD via Link header",
  	"purpose": "Tests for correct retrieval of remote JSON-LD when it is present as a Link HTTP header",
  	"input": "/full/path/to/pyld/tests/sample.jsonld",
  	"expect": "/full/path/to/pyld/tests/output.jsonld"
  }, {
    "@id": "#t2",
    "@type": ["jld:PositiveEvaluationTest", "jld:ExpandTest"],
    "name": "Test for JSON-LD via direct JSON-LD URL",
    "purpose": "Tests for correct retrieval of remote JSON-LD when it is given as a direct link within a context",
    "input": "/full/path/to/pyld/tests/sample2.jsonld",
    "expect": "/full/path/to/pyld/tests/output.jsonld"
  }]
}

sample.jsonld

{
	"@context": "https://schema.org",
	"@type":"Dataset",
	"@id":"http://localhost:5000/collections/obs",
	"url":"http://localhost:5000/collections/obs"
}

sample2.jsonld

{
	"@context": "https://schema.org/docs/jsonldcontext.jsonld",
	"@type":"Dataset",
	"@id":"http://localhost:5000/collections/obs",
	"url":"http://localhost:5000/collections/obs"
}

output.jsonld

[
  {
    "@id": "http://localhost:5000/collections/obs",
    "@type": [
      "http://schema.org/Dataset"
    ],
    "http://schema.org/url": [
      {
        "@id": "http://localhost:5000/collections/obs"
      }
    ]
  }
]

The tests both fail before the changes. The tests both passs after the changes. Since all the test cases of this repository are remote, I am not sure whether or where to contribute these test cases.

However, there is one regression, which I do not currently know how to resolve. This test does not have an error before the changes, and does have an error after the changes. The report is:

======================================================================
ERROR: Remote document: https://w3c.github.io/json-ld-api/tests/remote-doc-manifest#t0002: Document loader loads a JSON document.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/runtests.py", line 372, in runTest
    raise e
  File "tests/runtests.py", line 319, in runTest
    result = getattr(jsonld, fn)(*params)
  File "tests/../lib/pyld/jsonld.py", line 163, in expand
    return JsonLdProcessor().expand(input_, options)
  File "tests/../lib/pyld/jsonld.py", line 820, in expand
    remote_doc = load_document(input_, options)
  File "tests/../lib/pyld/jsonld.py", line 6591, in load_document
    code='loading document failed')
pyld.jsonld.JsonLdError: ('No remote document found at the given URL.',)
Type: jsonld.NullRemoteDocument
Code: loading document failed

All other tests are unaffected. However, as mentioned in #128, running the tests as described with no changes elicits 5 failures, 2 errors, and 34 skipped tests.

@datadavev datadavev mentioned this pull request Aug 3, 2020
datadavev added a commit to datadavev/pyld that referenced this pull request Aug 3, 2020
@JulienPalard
Copy link

This PR does not fixes the following test:

from pyld import jsonld

jsonld.expand(
    {
        "@context": "http://schema.org/",
        "@type": "Person",
        "name": "Jane Doe",
        "jobTitle": "Professor",
        "telephone": "(425) 123-4567",
        "url": "http://www.janedoe.com",
    }
)

because https://schema.org gives Content-Type: text/html, which is in headers["Accept"]:

(Pdb) p headers["Accept"]
'application/ld+json;profile=http://www.w3.org/ns/json-ld#context, application/ld+json, application/json;q=0.5, text/html;q=0.8, application/xhtml+xml;q=0.8'

it's here due to pyld/jsonld.py adding it:

6573 	    # FIXME: only if html5lib loaded?
6574 	    headers['Accept'] = headers['Accept'] + ', text/html;q=0.8, application/xhtml+xml;q=0.8'

According to json-ld:

A processor seeing a non-JSON result will note the presence of the link header and load that document instead.

So even if we accept HTML, if it's HTML and there's a json-ld alternative, let's use it.

To do this, I'll move doc['document'] = response.json() right before returning, and just drop the if content_type not in headers['Accept']: you added, it works for me.

In other words (and with a correct indentation) I mean:

--- a/lib/pyld/documentloader/requests.py
+++ b/lib/pyld/documentloader/requests.py
@@ -69,7 +69,6 @@ def requests_document_loader(secure=False, **kwargs):
                 'contentType': content_type,
                 'contextUrl': None,
                 'documentUrl': response.url,
-                'document': response.json() if content_type in headers['Accept'] else None
             }
             link_header = response.headers.get('link')
             if link_header:
@@ -77,15 +76,15 @@ def requests_document_loader(secure=False, **kwargs):
                     LINK_HEADER_REL)
                 # only 1 related link header permitted
                 if linked_context and content_type != 'application/ld+json':
-                  if isinstance(linked_context, list):
-                      raise JsonLdError(
-                          'URL could not be dereferenced, '
-                          'it has more than one '
-                          'associated HTTP Link Header.',
-                          'jsonld.LoadDocumentError',
-                          {'url': url},
-                          code='multiple context link headers')
-                  doc['contextUrl'] = linked_context['target']
+                    if isinstance(linked_context, list):
+                        raise JsonLdError(
+                            'URL could not be dereferenced, '
+                            'it has more than one '
+                            'associated HTTP Link Header.',
+                            'jsonld.LoadDocumentError',
+                            {'url': url},
+                            code='multiple context link headers')
+                    doc['contextUrl'] = linked_context['target']
                 linked_alternate = parse_link_header(link_header).get('alternate')
                 # if not JSON-LD, alternate may point there
                 if (linked_alternate and
@@ -93,9 +92,8 @@ def requests_document_loader(secure=False, **kwargs):
                         not re.match(r'^application\/(\w*\+)?json$', content_type)):
                     doc['contentType'] = 'application/ld+json'
                     doc['documentUrl'] = prepend_base(url, linked_alternate['target'])
-                    if content_type not in headers['Accept']:
-                        # Original was not JSON/JSON-LD; fetch linked JSON-LD
-                        return loader(doc['documentUrl'], options=options)
+                    return loader(doc['documentUrl'], options=options)
+            doc['document'] = response.json()
             return doc
         except JsonLdError as e:
             raise e

@alpha-beta-soup
Copy link
Author

Hmm, that's a bit odd. Yes, Pyld adds text/html (in the line you pointed out, i.e. headers['Accept'] = headers['Accept'] + ', text/html;q=0.8, application/xhtml+xml;q=0.8') but that's a default header value and it shouldn't affect the server response as long as application/json+ld is included in the Accept header value with a higher precedence. That check

if content_type not in headers['Accept']:
    # Original was not JSON/JSON-LD; fetch linked JSON-LD
    return loader(doc['documentUrl'], options=options)

is important, it checks whether the server responded with application/json+ld, and if not, attempts to fetch the linked resource. There's no value doing this if the response is already JSON-LD.

@JulienPalard
Copy link

it shouldn't affect the server response as long as application/json+ld is included in the Accept header value with a higher precedence.

Totally agree. But looks like https://schema.org/ does not have a application/json-ld variant at all, so it always reply with text/html, independently of the Accept header. But this text/html response links to the ld+json, see:

$ curl -I -H "Accept: application/ld+json" https://schema.org
HTTP/2 200 
[...]
link: </docs/jsonldcontext.jsonld>; rel="alternate"; type="application/ld+json"
[...]
content-type: text/html
[...]

Which looks legit to me, even though I still didn't read and understood rfc8288 entierly.

That check

if content_type not in headers['Accept']:
    # Original was not JSON/JSON-LD; fetch linked JSON-LD
    return loader(doc['documentUrl'], options=options)

is important, it checks whether the server responded with application/json+ld, and if not, attempts to fetch the linked resource. There's no value doing this if the response is already JSON-LD.

IIRC It looks already covered by the:

not re.match(r'^application/(\w*+)?json$', content_type)):

check a few line before.

@alpha-beta-soup
Copy link
Author

Ah I see. Sorry it's been a while since I've looked at this. I think your version of the PR makes sense, and as long as both our tests pass, it gets my vote.

@JulienPalard
Copy link

and as long as both our tests pass, it gets my vote.

I'd like to add both tests to the test suite, but I don't understand how to do so. If someone can explain before merging I'd gladly do so.

@@ -12,8 +12,7 @@
import string
import urllib.parse as urllib_parse

from pyld.jsonld import (JsonLdError, parse_link_header, LINK_HEADER_REL)

from pyld.jsonld import (JsonLdError, parse_link_header, prepend_base, LINK_HEADER_REL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from pyld.jsonld import (JsonLdError, parse_link_header, prepend_base, LINK_HEADER_REL)
from pyld.jsonld import JsonLdError, parse_link_header, prepend_base, LINK_HEADER_REL

I believe the braces are redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants