Scraper request: bergamot.app #986

josefhelie · 2024-01-15T11:44:42Z

I'm currently using the free app Bergamot (which is closed source) to store my recipes, but I'd like to move to Mealie. I've encountered an error message that says, 'recipe_scrapers was unable to scrape this URL.' Is it possible to get a scraper, please? 😇
Thanks for your help.
A link to a shared recipe: https://dashboard.bergamot.app/shared/T8IJLjbtHdh2pj

jayaddison · 2024-01-15T15:27:19Z

Hi @josefhelie - thanks for the question / feature request.

In theory, yes this is possible - the webpage is public and represents a recipe. However, there are some potentially important items of information absent on the page: in particular, its origin (from another website? self-authored?) and the instructions.

Do you know whether those details can be included when sharing a recipe like this from the app? It's difficult to develop and test without a few complete samples.

josefhelie · 2024-01-16T16:17:55Z

i'm sorry I shared a recipe that don't reflect all the requested fields. Here is a better example: https://dashboard.bergamot.app/shared/mIB4jYQtZU1A97
Is it better?

jayaddison · 2024-01-20T14:32:13Z

Yep, that initially looks good to me @josefhelie - it's difficult to say for certain without coding it up, but it seems to have most/all of the information we'd need. Thanks!

josefhelie · 2024-01-22T09:01:45Z

Thanks a lot @jayaddison :)

josefhelie · 2024-04-03T14:22:12Z

May I ask any update on this request @jayaddison?
thanks :)

jayaddison · 2024-04-14T16:43:14Z

Hi @josefhelie - apologies for my delayed reply. No further updates on this at the moment I'm afraid. Do you have any interest in learning some Python coding?

mlduff · 2024-04-15T09:29:35Z

@jayaddison I took a look, looks like it is fairly easy to call the API endpoint, which can be derived from the URL of the recipe.
For https://dashboard.bergamot.app/shared/mIB4jYQtZU1A97 the associated API endpoint is https://api.bergamot.app/recipes/shared?r=mIB4jYQtZU1A97.

I'm not sure how the library normally supports the case of recipes being loaded via an API call after the original page load - I can see a few examples (goustojson.py, monsieurcuisine.py) that seem to do this - I would be happy to tackle this if you are happy to me to do so?

jayaddison · 2024-04-15T10:31:46Z

Thanks @mlduff!

I'm not sure how the library normally supports the case of recipes being loaded via an API call after the original page load - I can see a few examples (goustojson.py, monsieurcuisine.py) that seem to do this - I would be happy to tackle this if you are happy to me to do so?

About the handling of APIs: yep, well discovered - we do have a few scrapers that retrieve data using APIs at the moment. A potential design/architecture problem with that is that it (currently) tightly-couples the scraper to an HTTP client - namely requests at the moment; nearly a de-facto client for Python, but even so, it may not be ideal to depend entirely on it.

Meanwhile we have a v15 development branch that can optionally use requests, but that otherwise requires callers to retrieve the HTML and pass it to the scraper themselves. Marginally less convenient, but allowing callers to use whatever HTTP client(s) they prefer (anything from built-in urlopen, low-level urllib3, requests, httpx, etc).

A long explanation, but the short answer is: yep, please go ahead, but be aware that this would currently only be supported in the v14 / mainline branch.

jayaddison · 2024-04-15T10:33:53Z

@mlduff also a design / implementation question for your consideration: those recipes sometimes contain a link to the original source of the recipe. Should we return that as the canonical URL for recipes when possible?

mlduff · 2024-04-15T10:37:05Z

Meanwhile we have a v15 development branch that can optionally use requests, but that otherwise requires callers to retrieve the HTML and pass it to the scraper themselves. Marginally less convenient, but allowing callers to use whatever HTTP client(s) they prefer (anything from built-in urlopen, low-level urllib3, requests, httpx, etc).

@jayaddison is your preference for me to develop this in the v15 branch? If I implement in v14 (which seems easier), will it then need rewriting at some point (are the other ones like the example I found going to also need similar rewriting?)?

mlduff · 2024-04-15T10:37:36Z

@mlduff also a design / implementation question for your consideration: those recipes sometimes contain a link to the original source of the recipe. Should we return that as the canonical URL for recipes when possible?

Good point, will try to do that.

jayaddison · 2024-04-15T11:34:04Z

Meanwhile we have a v15 development branch that can optionally use requests, but that otherwise requires callers to retrieve the HTML and pass it to the scraper themselves. Marginally less convenient, but allowing callers to use whatever HTTP client(s) they prefer (anything from built-in urlopen, low-level urllib3, requests, httpx, etc).

@jayaddison is your preference for me to develop this in the v15 branch? If I implement in v14 (which seems easier), will it then need rewriting at some point (are the other ones like the example I found going to also need similar rewriting?)?

I'd recommend implementing it for v14, yep.

josefhelie · 2024-04-15T12:42:03Z

Hi @josefhelie - apologies for my delayed reply. No further updates on this at the moment I'm afraid. Do you have any interest in learning some Python coding?
Thanks @jayaddison, but i don't have enough free time to do that, even if I would like to!! 😢
Thanks @mlduff too :)

mlduff · 2024-04-15T13:24:29Z

@jayaddison I noticed that the tests for the two scrapers I mentioned above are located under the legacy section - do I add my tests under there as well?

mlduff · 2024-04-15T13:37:10Z

@josefhelie are you able to provide a couple more recipe URLs please so I can test?

jayaddison · 2024-04-15T15:02:15Z

@jayaddison I noticed that the tests for the two scrapers I mentioned above are located under the legacy section - do I add my tests under there as well?

@mlduff yep, that's the correct place for those; thanks for checking 👍 You should be able to configure the expected_requests property in the tests to return example results for both the initial HTML HTTP GET response, and also the subsequent (probably also HTTP GET) API request.

jayaddison · 2024-04-17T16:25:21Z

@josefhelie have you found any pages shared on Bergamot where the original author is credited? I've seen a few pages that have the domain name of the source URL.. I'm wondering whether there are any that list names/usernames.

josefhelie · 2024-04-17T16:32:50Z

@jayaddison I'm not sure I have. Would it help you if you provide me a recipe I could import into Bergamot and then give you the link towards the imported recipe?

mlduff · 2024-04-17T22:25:14Z

@josefhelie Here is one that has an author https://www.bestrecipes.com.au/recipes/peanut-butter-cookies-recipe/fowk6kuy

josefhelie · 2024-04-18T07:16:12Z

I imported it in my Bergamot, here it is: https://dashboard.bergamot.app/shared/REbGkQaNoVJ5kM

jayaddison · 2024-04-19T11:29:19Z

Thanks @josefhelie - so roughly speaking, it seems like some source recipes may include author info, and the Bergamot page includes a link back to the original, but our scraper can't directly retrieve the author details at the moment (they're not in the Bergamot page, so it seems like we'd have to ask Bergamot to add those, or to retrieve them ourselves from the original URL).

I'm not completely sure what to do here; I personally place quite a lot of important on retaining the author name/info (even though it's challenging sometimes) because my assumption is that a lot of recipe authors themselves would want that to be included when people view their recipes.

I haven't contacted Bergamot to ask whether they'd consider attempting to include that info themselves, so that's one option I'm considering. Is there a support/feedback option in the app itself?

jayaddison added the enhancement label Jan 20, 2024

jayaddison changed the title ~~Is it possible to design a scrapper for Bergamot?~~ Scraper request: bergamot.app Jan 20, 2024

josefhelie closed this as completed Apr 15, 2024

josefhelie reopened this Apr 15, 2024

mlduff linked a pull request Apr 16, 2024 that will close this issue

Add bergamot #1064

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraper request: bergamot.app #986

Scraper request: bergamot.app #986

josefhelie commented Jan 15, 2024

jayaddison commented Jan 15, 2024

josefhelie commented Jan 16, 2024

jayaddison commented Jan 20, 2024

josefhelie commented Jan 22, 2024

josefhelie commented Apr 3, 2024 •

edited

Loading

jayaddison commented Apr 14, 2024

mlduff commented Apr 15, 2024 •

edited

Loading

jayaddison commented Apr 15, 2024

jayaddison commented Apr 15, 2024

mlduff commented Apr 15, 2024 •

edited

Loading

mlduff commented Apr 15, 2024

jayaddison commented Apr 15, 2024

josefhelie commented Apr 15, 2024

mlduff commented Apr 15, 2024

mlduff commented Apr 15, 2024

jayaddison commented Apr 15, 2024

jayaddison commented Apr 17, 2024

josefhelie commented Apr 17, 2024

mlduff commented Apr 17, 2024

josefhelie commented Apr 18, 2024

jayaddison commented Apr 19, 2024

Scraper request: bergamot.app #986

Scraper request: bergamot.app #986

Comments

josefhelie commented Jan 15, 2024

jayaddison commented Jan 15, 2024

josefhelie commented Jan 16, 2024

jayaddison commented Jan 20, 2024

josefhelie commented Jan 22, 2024

josefhelie commented Apr 3, 2024 • edited Loading

jayaddison commented Apr 14, 2024

mlduff commented Apr 15, 2024 • edited Loading

jayaddison commented Apr 15, 2024

jayaddison commented Apr 15, 2024

mlduff commented Apr 15, 2024 • edited Loading

mlduff commented Apr 15, 2024

jayaddison commented Apr 15, 2024

josefhelie commented Apr 15, 2024

mlduff commented Apr 15, 2024

mlduff commented Apr 15, 2024

jayaddison commented Apr 15, 2024

jayaddison commented Apr 17, 2024

josefhelie commented Apr 17, 2024

mlduff commented Apr 17, 2024

josefhelie commented Apr 18, 2024

jayaddison commented Apr 19, 2024

josefhelie commented Apr 3, 2024 •

edited

Loading

mlduff commented Apr 15, 2024 •

edited

Loading

mlduff commented Apr 15, 2024 •

edited

Loading