Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature suggestion: provide a convenient way to override metadata parsing for individual recipe sites #1364

Open
jayaddison opened this issue Nov 7, 2024 · 0 comments

Comments

@jayaddison
Copy link
Collaborator

Both #1357 and #1363 relate to individual recipe sites that have slightly non-standard metadata representations in their page data (schema.org for both of those, although this could equally apply to OpenGraph metadata).

The AbstractScraper instantiates metadata parsers for both OpenGraph and schema.org in the default initializer.

Theoretically individual recipe scrapers could override the __init__ method and assign their own -- or perhaps call the super().__init__(...), either pre-processing the page data or performing adjustments to the schema parse results.

I think a logically more elegant approach would be to add class attributes to AbstractScraper that define the metadata scrapers to use -- e.g. references to OpenGraph and SchemaOrg by default. Then for recipe sites that have slightly divergent metadata content, we could reassign those attributes to module-local custom parser implementations (often probably inheriting from the original parser classes).

The main benefit of this is that it would provide recipe scraper developers with a way to customize metadata parsing in a way that can handle small deviations from the standards where needed for particular recipe websites, without affecting the schema parsing applied to other sites.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant