Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TheKitchn website scraper still does not work in Mealie latest with latest scraper #1362

Open
calm108 opened this issue Nov 4, 2024 · 4 comments
Labels
bots-protection А form of bot protection is preventing the fetching of the recipe's HTML

Comments

@calm108
Copy link

calm108 commented Nov 4, 2024

Pre-filing checks

  • [ YES] I have searched for open issues that report the same problem
  • [ YES] I have checked that the bug affects the latest version of the library

The URL of the recipe(s) that are not being scraped correctly

...

The results you expect to see

...the recipe

The results (including any Python error messages) that you are seeing

...
Cannot find anything error from Mealie.

@calm108 calm108 added the bug label Nov 4, 2024
@calm108
Copy link
Author

calm108 commented Nov 4, 2024

I ahve added this bug before and it was closed. I get no 403 error and can visti the URL no problem. Tested other recipes sites (allecipes and it works fine) EVERY recipe I try on TheKitchn fails.

@calm108 calm108 changed the title TheKitchn webstie scraper still does not work in Mealie latest with latest scraper TheKitchn website scraper still does not work in Mealie latest with latest scraper Nov 4, 2024
@jayaddison
Copy link
Collaborator

Ok, thanks @calm108 - let's take another look at this. Based on what you've explained so far, could you let me know if the following is accurate:

  • You're able to open the mentioned recipe(s) in the browser absolutely fine.
  • The recipes cannot be loaded from Mealie.
    • Previously this appeared as some kind of HTTP 403.
    • Now it is... ? (but either way, it works when opening a normal web browser).

If so: first of all, I can replicate those locally, so I believe you. Secondly: it certainly seems to indicate that something is blocking particular HTTP clients.

@calm108
Copy link
Author

calm108 commented Nov 5, 2024

You have it exactly correct 👍 I have tried multiple recipe URL’s on thekitchn and all with the same result.

@jayaddison
Copy link
Collaborator

Ok; I should also note that when I replicated the problem, that was using recipe-scrapers directly, without the involvement of the Mealie application.

I'm going to add the 'bots protection' label to this bug, since it seems to be a case where a recipe site is rejecting some network traffic.

@jayaddison jayaddison added bots-protection А form of bot protection is preventing the fetching of the recipe's HTML and removed bug labels Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bots-protection А form of bot protection is preventing the fetching of the recipe's HTML
Projects
None yet
Development

No branches or pull requests

2 participants