You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Both #1357 and #1363 relate to individual recipe sites that have slightly non-standard metadata representations in their page data (schema.org for both of those, although this could equally apply to OpenGraph metadata).
The AbstractScraper instantiates metadata parsers for both OpenGraph and schema.org in the default initializer.
Theoretically individual recipe scrapers could override the __init__ method and assign their own -- or perhaps call the super().__init__(...), either pre-processing the page data or performing adjustments to the schema parse results.
I think a logically more elegant approach would be to add class attributes to AbstractScraper that define the metadata scrapers to use -- e.g. references to OpenGraph and SchemaOrg by default. Then for recipe sites that have slightly divergent metadata content, we could reassign those attributes to module-local custom parser implementations (often probably inheriting from the original parser classes).
The main benefit of this is that it would provide recipe scraper developers with a way to customize metadata parsing in a way that can handle small deviations from the standards where needed for particular recipe websites, without affecting the schema parsing applied to other sites.
The text was updated successfully, but these errors were encountered:
Both #1357 and #1363 relate to individual recipe sites that have slightly non-standard metadata representations in their page data (
schema.org
for both of those, although this could equally apply toOpenGraph
metadata).The
AbstractScraper
instantiates metadata parsers for bothOpenGraph
andschema.org
in the default initializer.Theoretically individual recipe scrapers could override the
__init__
method and assign their own -- or perhaps call thesuper().__init__(...)
, either pre-processing the page data or performing adjustments to the schema parse results.I think a logically more elegant approach would be to add class attributes to
AbstractScraper
that define the metadata scrapers to use -- e.g. references toOpenGraph
andSchemaOrg
by default. Then for recipe sites that have slightly divergent metadata content, we could reassign those attributes to module-local custom parser implementations (often probably inheriting from the original parser classes).The main benefit of this is that it would provide recipe scraper developers with a way to customize metadata parsing in a way that can handle small deviations from the standards where needed for particular recipe websites, without affecting the schema parsing applied to other sites.
The text was updated successfully, but these errors were encountered: