-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schema.org author retrieval: author from WebPage not returned #1380
Comments
This part of the code seems intended to handle this case -- a recipe-scrapers/recipe_scrapers/_schemaorg.py Lines 100 to 106 in 43093df
Does that code ever run, though? |
For this webpage, weirdly, no: because there are two Approximately:
Our |
Two options that I can think of:
I'm going to take a break for a while here and will look at this again soon (next day or two probably). |
That's a pretty good find! My gut feeling tells me we should go with the second option that you've proposed, where this is handled in I feel like instead of setting we can pass these "findings" through a cleverer |
Sounds good to me. Let's also try to group those updates by the |
Pre-filing checks
The URL of the recipe(s) that are not being scraped correctly
Discovered during discussion at #1378 (comment)
The results you expect to see
The author name should be available from
schema.org
metadata on the recipe page, by accessing theauthor
field - in particular theWebPage
item.The results (including any Python error messages) that you are seeing
The
author
field returns a null/empty value.The text was updated successfully, but these errors were encountered: