Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google expects plain schema-org url #576

Merged
merged 3 commits into from
Dec 1, 2020

Conversation

pvgenuchten
Copy link
Contributor

@pvgenuchten pvgenuchten commented Nov 18, 2020

This PR started out as a 'simple' namespace redefine. However it turns out to be related to a wider issue on the schema-org-as-json-ld domain. https://schema.org itself does not advertise a ld-context, but crawlers still require this url to be used as namespace. pyld, which is used to validate the json-ld, however requires a valid ld-context.

That's why the initial commit was reverted and a hacky-workaround is suggested, within the ajax-response the schema-org url is rewritten to fit the needs of search engine crawlers.

Issue does resolve #574

@tomkralidis
Copy link
Member

+1 from me, thanks @pvgenuchten

@alpha-beta-soup
Copy link
Contributor

This implies reverting to the original, that had to be changed for reasons #444 and these reasons still apply (that's why the tests failed). It's still an upstream problem with pyld.

Copy link
Contributor

@alpha-beta-soup alpha-beta-soup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change causes tests to fail, since they depend on pyld to parse the JSON-LD metadata. Either we fix pyld upstream, or use a different Python JSON-LD parsing library (or perhaps don't parse it at all).

@pvgenuchten
Copy link
Contributor Author

pvgenuchten commented Nov 19, 2020

i was afraid of this...
Or fix the schema-org validator... (let me try other schema-org validators)
as a workaround, maybe we can shorten the context url as part of the ajax request that introduces the snippet?

@pvgenuchten
Copy link
Contributor Author

just tested with google webmaster console
that tool seems to trigger only on the microdata and not on the embedded json-ld (in contradiction to structured data testing tool)
it could be related to the schem-org context url, but a bit puzzled/worried here about the impact of this

@dblodgett-usgs
Copy link

@ksonda -- do you have any recollection of the nuances here?

@ksonda
Copy link
Contributor

ksonda commented Nov 19, 2020

I'm not familiar with pyld or why that would be affected, but Google will be retiring the structured data test tool, which didn't seem to pull in remote contexts other than schema.org. You could specify predicates manually though. The new tool, Google Rich Results, will only parse microdata for resources of the following schema.org types:

image

@pvgenuchten
Copy link
Contributor Author

hi @dblodgett-usgs,

nice to see the demo based on master is redeployed with the microdata being removed from last merged PR... helps testing

The json-ld is now only injected using ajax.

I noticed structured-data-testing tool picks up the injected script code.

image

However rich results test does not find any embedded structured data.

image

Also yandex is not able to retrieve any

image

I guess the ajax approach is not optimal, at least for some clients. Do we have an option to embed the json-ld using python?

some reading:
https://webmasters.stackexchange.com/questions/100935/why-wouldnt-google-be-able-to-read-ajax-generated-json-ld-schema-org-markup-par

@pvgenuchten
Copy link
Contributor Author

Note that the some of the problems may also be caused by #577,

indeed:

image

but still curious why it finds a button and not a dataset

@pvgenuchten
Copy link
Contributor Author

pvgenuchten commented Nov 23, 2020

I implemented the following test scenario:

  • i made a basic html page available at http://pygeoapi.genuchten.net
  • the page ingests schema org using ajax similar as pygeoapi
  • i tested the page in google search console
  • i updated context url to be plain schema.org
    result: schema.org is parsed

image

then google complains:

image

Google rich results (probably using similar technology) also parses the ajax-loaded schema.org correctly

image

@alpha-beta-soup
Copy link
Contributor

I still think the correct option is to attempt a PR to pyld so that it is able to use the appropriate schema.org URL; this would benefit other projects, and avoids the inconsistency between the direct JSON-LD response and the embedded data.

@alpha-beta-soup
Copy link
Contributor

digitalbazaar/pyld#129

@pvgenuchten
Copy link
Contributor Author

@alpha-beta-soup i agree with the direction, however currently schema.org is broken since the removal of schema-org-microdata, i see this as a workaround until pyld would be optimised, are you ok to adopt it as a temporary workaround? Alternative option would be to revert the microdata removal.

@alpha-beta-soup
Copy link
Contributor

Not sure it's my call. My preference would be to keep the microdata, but I'm OK with the "temporary workaround" too as long as there's a corresponding issue/s that so it doesn't drop off the radar.

@pvgenuchten
Copy link
Contributor Author

pvgenuchten commented Dec 1, 2020

I expect the microdata is an additional maintenance effort, and risk of being neglected as it is spread out all over the templates, embedded jsonld is more central to manage. Also there is the risk that the embedded json-ld contradicts the microdata. So I am in favour of adopting the temporary workaround.

An alternative strategy would be to use dcat ontology in json-ld, many crawlers also support dcat these days.

I suggest to merge this and create a new issue for the pyld-schema-org-url discussion

@tomkralidis tomkralidis merged commit f4859d6 into geopython:master Dec 1, 2020
@pvgenuchten
Copy link
Contributor Author

thanx Tom

pvgenuchten pushed a commit to pvgenuchten/pygeoapi that referenced this pull request Jan 18, 2021
* google expects plain schema-org url
https://yoast.com/json-ld/
resolves geopython#574

* Revert "google expects plain schema-org url"

This reverts commit 7f09d4c.

* hack to replace full path for short path, because pyld requires full, search engine expects short
francbartoli pushed a commit to francbartoli/pygeoapi that referenced this pull request Jul 8, 2021
* google expects plain schema-org url
https://yoast.com/json-ld/
resolves geopython#574

* Revert "google expects plain schema-org url"

This reverts commit 7f09d4c.

* hack to replace full path for short path, because pyld requires full, search engine expects short
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

duplicated schema-org annotation & parsing failure
5 participants