Share a project #9

za3k · 2017-07-08T19:56:23Z

I thought I'd share what I made with this: https://archive.org/details/recipes-en-201706
A full version of allrecipes, epicurious, cookstr, and bbc.co.uk, parsed into nice JSON with photos.

Sorry to abuse 'issues', there's no option to send a private message on github as far as I know.

justinmklam · 2020-03-26T23:23:29Z

To piggyback off this sharing post, I made a web app to convert recipes from volumetric to metric units (mainly for the purpose of baking). See gif below for demo usage.

Repo: https://github.com/justinmklam/recipe-converter

Thanks again for creating this great library! It really opens up opportunities to create new projects with this as leverage.

boonepeter · 2020-06-29T15:11:22Z

Thanks for this package, @hhursev! I set it up as an API (source, live) and am using it in my simple recipe website here (source).

I'll add support for some websites when I come across them.

bfcarpio · 2021-01-11T00:32:48Z

I suppose I should contribute my quick script too!

I'm more of a terminal guy so I wrote a quick python script to convert a recipe into markdown that can be cat'd.

source

jayaddison · 2021-02-26T11:58:51Z

Re-importing and re-indexing recipe content into https://www.reciperadar.com was a breeze yesterday, largely thanks to recipe-scrapers, and the quality of the recipe content (although not yet quantified) feels and looks pretty good to me.

I'd like to add a big thanks to @hhursev and @bfcarpio in particular (although to everyone who has contributed to recipe-scrapers, really) for developing and maintaining the library. Glad to be a part of this community :)

micahcochran · 2021-07-29T21:29:17Z

I've created recipe-crawler, which is a configurable web crawler for recipes. It uses recipe-scraper for a couple of websites that don't have data structured in the schema.org/Recipe format.

Please crawl responsibly.

jayaddison · 2022-11-22T13:44:15Z

This seems as good a place as any to celebrate that recipe-scrapers has reached the 1000-stars milestone on GitHub 😄 🍾 🎉

Here's hoping for the continuation and development of many useful recipe projects (current and future) thanks to this library.

jlucaspains · 2023-01-29T01:00:02Z

I've worked on a recipe book app for the last 3 years. Until recently, I had built my own massively over complicated recipe scraper so when I found recipe-scrapers project it was such a great day.

Anyway, the installable web version of the app is nearly ready for 1.0 and folks can start using it at https://app.sharpcooking.net. The project is open source and available at GitHub sharpcooking-web.

Thanks for this great project!

jayaddison · 2023-10-01T09:02:33Z

Since we don't have a mailing list for users of the library, I'm going to share this here, because hopefully people with related projects will find it useful:

We now have a developer documentation section that should help to make it easier to develop and maintain scrapers. Many thanks to @strangetom for writing this up!

mkayeterry · 2024-04-17T16:04:07Z

First off, I love this repo so thanks to @hhursev and all the contributors!

That being said, the first question I had when I found it was "so, where do I get the recipes?”. So I made a quick tool, recipe-urls, to compile recipe-specific urls from any given base url, to then be fed into recipe-scrapers.

Check it out if you'd like... or don't! Still requires some brute force url compiling, but increased my output considerably.

jlucaspains · 2024-04-17T17:19:55Z

First off, I love this repo so thanks to @hhursev and all the contributors!

That being said, the first question I had when I found it was "so, where do I get the recipes?”. So I made a quick tool, recipe-urls, to compile recipe-specific urls from any given base url, to then be fed into recipe-scrapers.

Check it out if you'd like... or don't! Still requires some brute force url compiling, but increased my output considerably.

Very interesting! I've had people ask similar things about my own recipe book app. Question for @mkayeterry: could you improve the URL listing by leveraging the site's sitemap.xml? Virtually every side has it because of SEO and they should list all URLS there directly. Your current filtering would work well with that too.

In any case, this is a cool and useful project!

mkayeterry · 2024-04-17T17:43:09Z

Very interesting! I've had people ask similar things about my own recipe book app. Question for @mkayeterry: could you improve the URL listing by leveraging the site's sitemap.xml? Virtually every side has it because of SEO and they should list all URLS there directly. Your current filtering would work well with that too.

In any case, this is a cool and useful project!

@jlucaspains Oh that's interesting! I'm pretty new to anything front end (over here frantically trying to figure out what a sitemap.xml is), so I'll definitely look into it more. Sounds promising and I'm very open to making the current setup a little more robust!

anguswg-ucsb · 2024-04-19T15:08:44Z

I've put together an ingredient parsing python package ingredient-slicer, which will parse ingredient strings (i.e. "2 1/2 cups of tomato sauce") and do a best effort extraction of the unit, quantity, food, gram_weight, and other extraneous details (prep, size_modifiers, etc.)

I made ingredient-slicer because I needed a lightweight ingredient parser with zero dependencies and that does NOT require/rely on a NLP/models to do its thing. The package uses only Python's standard library and is pretty quick.

Its by no means perfect for extracting food perfectly from an ingredient but it does a really good job with unit and quantity and applying any extra information mentioned in parenthetical references (i.e. "2 salmon steaks (8 ounces each)" ends up with a unit of "ounces" and a quantity of "16" ---> 16 ounces = 2 * 8 ounces each) .

An example to illustrate:

pip install ingredient-slicer

import ingredient_slicer

slicer = ingredient_slicer.IngredientSlicer("2 (15-ounces) cans chickpeas, rinsed and drained")

slicer.to_json()

{   
    'ingredient': '2 (15-ounces) cans chickpeas, rinsed and drained', 
    'standardized_ingredient': '2 cans chickpeas, rinsed and drained', 
    'food': 'chickpeas', 

    # primary quantity and units
    'quantity': '30', 
    'unit': 'ounces', 
    'standardized_unit': 'ounce', 

    # any other secondary quantity and units found in the string
    'secondary_quantity': '2', 
    'secondary_unit': 'cans', 
    'standardized_secondary_unit': 'can', 

    'gram_weight': '850.49', 
    'prep': ['drained', 'rinsed'], 
    'size_modifiers': [], 
    'dimensions': [], 
    'is_required': True, 
    'parenthesis_content': ['15 ounce']
}

It fixed a problem for me so thought it might be helpful for other people too!
And thank you for everyone that contributes/maintains recipe-scrapers its a great tool you all have built/maintained, keep up the great work!

jp-berg · 2024-04-21T13:57:40Z

Hey, over the past year or so I wanted to dive deeper into Python-development, so I used this project as a basis for my CLI-app recipe2txt.

This was my motivation to examine various aspects of the language and Python-project-management a little closer, so it may be unconventional in some parts, but as far as I know everything works.

Features include asynchronous fetching, jinja-templating and local caching of recipes. And (maybe the most interesting part for recipe-scrapers) it generates formatted Github-issues if any scraping-errors are encountered during the process, so that the user can easily report any errors here.

Thank you to all contributors here that made the hard part of recipe-scraping easy!

timsamart · 2024-07-12T22:21:18Z

Hey has anyone scraped all the available or a large amount of data and could share? I have a research project I want to launch and need as much data as possible.

jaspervzwi · 2024-11-14T17:20:21Z

Hey all! I'm working on a tool that maintains a database by scraping all recipe pages from a given website. It pulls the sitemap, selects all pages with recipes and then creates a dict or json file with all metadata scraped by recipe-scraper.

Feel free to check it out at recipe-database-scraper

@mkayeterry I realise there's a bit of overlap with the repo you shared earlier this year. Hope you don't mind. One of my goals was to continue finding new recipe pages added to a website. I couldn't figure out a good way to reconcile that with your repo, so I went in a different direction.

jaspervzwi · 2024-11-14T17:22:48Z

Hey has anyone scraped all the available or a large amount of data and could share? I have a research project I want to launch and need as much data as possible.

@timsamart started working on that now, but it'll take a while to go through all websites.
Did you already build that db by now?

hhursev mentioned this issue May 10, 2020

Exception with 6.0.7: "This should be implemented". #148

Closed

synergiator mentioned this issue May 14, 2020

Verify a set of wrongly parsed URLs with a version from 2017 #155

Closed

hhursev mentioned this issue Jun 28, 2020

How do I use this scrapper? #168

Closed

bfcarpio mentioned this issue Jan 24, 2021

Tweaks, Types, and Cleanup #302

Merged

bfcarpio mentioned this issue Apr 3, 2021

Scrape Recipe from File #368

Closed

bfcarpio mentioned this issue Jun 20, 2021

NIHHealthyEating issues with a page without content #393

Closed

5 tasks

founderio mentioned this issue Oct 9, 2021

Make use of recipe-scrapers for more generic web imports GourmandRecipeManager/gourmand#18

Closed

Repository owner deleted a comment from tobiaghiraldini Jan 22, 2023

Repository owner deleted a comment from smilerz Jan 22, 2023

hhursev changed the title ~~Sharing a project~~ Share a project Jan 22, 2023

hhursev pinned this issue Jul 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share a project #9

Share a project #9

za3k commented Jul 8, 2017

justinmklam commented Mar 26, 2020

boonepeter commented Jun 29, 2020

bfcarpio commented Jan 11, 2021

jayaddison commented Feb 26, 2021

micahcochran commented Jul 29, 2021

jayaddison commented Nov 22, 2022

jlucaspains commented Jan 29, 2023

jayaddison commented Oct 1, 2023

mkayeterry commented Apr 17, 2024

jlucaspains commented Apr 17, 2024

mkayeterry commented Apr 17, 2024

anguswg-ucsb commented Apr 19, 2024

jp-berg commented Apr 21, 2024

timsamart commented Jul 12, 2024

jaspervzwi commented Nov 14, 2024

jaspervzwi commented Nov 14, 2024

Share a project #9

Share a project #9

Comments

za3k commented Jul 8, 2017

justinmklam commented Mar 26, 2020

boonepeter commented Jun 29, 2020

bfcarpio commented Jan 11, 2021

jayaddison commented Feb 26, 2021

micahcochran commented Jul 29, 2021

jayaddison commented Nov 22, 2022

jlucaspains commented Jan 29, 2023

jayaddison commented Oct 1, 2023

mkayeterry commented Apr 17, 2024

jlucaspains commented Apr 17, 2024

mkayeterry commented Apr 17, 2024

anguswg-ucsb commented Apr 19, 2024

jp-berg commented Apr 21, 2024

timsamart commented Jul 12, 2024

jaspervzwi commented Nov 14, 2024

jaspervzwi commented Nov 14, 2024