Feature suggestion: provide a convenient way to override metadata parsing for individual recipe sites #1364

jayaddison · 2024-11-07T14:49:03Z

Both #1357 and #1363 relate to individual recipe sites that have slightly non-standard metadata representations in their page data (schema.org for both of those, although this could equally apply to OpenGraph metadata).

The AbstractScraper instantiates metadata parsers for both OpenGraph and schema.org in the default initializer.

Theoretically individual recipe scrapers could override the __init__ method and assign their own -- or perhaps call the super().__init__(...), either pre-processing the page data or performing adjustments to the schema parse results.

I think a logically more elegant approach would be to add class attributes to AbstractScraper that define the metadata scrapers to use -- e.g. references to OpenGraph and SchemaOrg by default. Then for recipe sites that have slightly divergent metadata content, we could reassign those attributes to module-local custom parser implementations (often probably inheriting from the original parser classes).

The main benefit of this is that it would provide recipe scraper developers with a way to customize metadata parsing in a way that can handle small deviations from the standards where needed for particular recipe websites, without affecting the schema parsing applied to other sites.

The text was updated successfully, but these errors were encountered:

jayaddison added enhancement core-team-job labels Nov 7, 2024

jayaddison mentioned this issue Nov 7, 2024

AbstractScraper: provide attributes to override default metadata parsers #1365

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature suggestion: provide a convenient way to override metadata parsing for individual recipe sites #1364

Feature suggestion: provide a convenient way to override metadata parsing for individual recipe sites #1364

jayaddison commented Nov 7, 2024

Feature suggestion: provide a convenient way to override metadata parsing for individual recipe sites #1364

Feature suggestion: provide a convenient way to override metadata parsing for individual recipe sites #1364

Comments

jayaddison commented Nov 7, 2024