Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share a project #9

Open
za3k opened this issue Jul 8, 2017 · 16 comments
Open

Share a project #9

za3k opened this issue Jul 8, 2017 · 16 comments

Comments

@za3k
Copy link
Contributor

za3k commented Jul 8, 2017

I thought I'd share what I made with this: https://archive.org/details/recipes-en-201706
A full version of allrecipes, epicurious, cookstr, and bbc.co.uk, parsed into nice JSON with photos.

Sorry to abuse 'issues', there's no option to send a private message on github as far as I know.

@justinmklam
Copy link

To piggyback off this sharing post, I made a web app to convert recipes from volumetric to metric units (mainly for the purpose of baking). See gif below for demo usage.

recipe-converter

Repo: https://github.com/justinmklam/recipe-converter

Thanks again for creating this great library! It really opens up opportunities to create new projects with this as leverage.

@boonepeter
Copy link
Contributor

Thanks for this package, @hhursev! I set it up as an API (source, live) and am using it in my simple recipe website here (source).

I'll add support for some websites when I come across them.

@bfcarpio
Copy link
Collaborator

I suppose I should contribute my quick script too!

I'm more of a terminal guy so I wrote a quick python script to convert a recipe into markdown that can be cat'd.

source

@jayaddison
Copy link
Collaborator

Re-importing and re-indexing recipe content into https://www.reciperadar.com was a breeze yesterday, largely thanks to recipe-scrapers, and the quality of the recipe content (although not yet quantified) feels and looks pretty good to me.

I'd like to add a big thanks to @hhursev and @bfcarpio in particular (although to everyone who has contributed to recipe-scrapers, really) for developing and maintaining the library. Glad to be a part of this community :)

@micahcochran
Copy link
Contributor

I've created recipe-crawler, which is a configurable web crawler for recipes. It uses recipe-scraper for a couple of websites that don't have data structured in the schema.org/Recipe format.

Please crawl responsibly.

@jayaddison
Copy link
Collaborator

This seems as good a place as any to celebrate that recipe-scrapers has reached the 1000-stars milestone on GitHub 😄 🍾 🎉

image

Here's hoping for the continuation and development of many useful recipe projects (current and future) thanks to this library.

Repository owner deleted a comment from tobiaghiraldini Jan 22, 2023
Repository owner deleted a comment from smilerz Jan 22, 2023
@hhursev hhursev changed the title Sharing a project Share a project Jan 22, 2023
@jlucaspains
Copy link
Contributor

I've worked on a recipe book app for the last 3 years. Until recently, I had built my own massively over complicated recipe scraper so when I found recipe-scrapers project it was such a great day.

Anyway, the installable web version of the app is nearly ready for 1.0 and folks can start using it at https://app.sharpcooking.net. The project is open source and available at GitHub sharpcooking-web.

Thanks for this great project!

@hhursev hhursev pinned this issue Jul 5, 2023
@jayaddison
Copy link
Collaborator

Since we don't have a mailing list for users of the library, I'm going to share this here, because hopefully people with related projects will find it useful:

We now have a developer documentation section that should help to make it easier to develop and maintain scrapers. Many thanks to @strangetom for writing this up!

@mkayeterry
Copy link

First off, I love this repo so thanks to @hhursev and all the contributors!

That being said, the first question I had when I found it was "so, where do I get the recipes?”. So I made a quick tool, recipe-urls, to compile recipe-specific urls from any given base url, to then be fed into recipe-scrapers.

Check it out if you'd like... or don't! Still requires some brute force url compiling, but increased my output considerably.

@jlucaspains
Copy link
Contributor

First off, I love this repo so thanks to @hhursev and all the contributors!

That being said, the first question I had when I found it was "so, where do I get the recipes?”. So I made a quick tool, recipe-urls, to compile recipe-specific urls from any given base url, to then be fed into recipe-scrapers.

Check it out if you'd like... or don't! Still requires some brute force url compiling, but increased my output considerably.

Very interesting! I've had people ask similar things about my own recipe book app. Question for @mkayeterry: could you improve the URL listing by leveraging the site's sitemap.xml? Virtually every side has it because of SEO and they should list all URLS there directly. Your current filtering would work well with that too.

In any case, this is a cool and useful project!

@mkayeterry
Copy link

Very interesting! I've had people ask similar things about my own recipe book app. Question for @mkayeterry: could you improve the URL listing by leveraging the site's sitemap.xml? Virtually every side has it because of SEO and they should list all URLS there directly. Your current filtering would work well with that too.

In any case, this is a cool and useful project!

@jlucaspains Oh that's interesting! I'm pretty new to anything front end (over here frantically trying to figure out what a sitemap.xml is), so I'll definitely look into it more. Sounds promising and I'm very open to making the current setup a little more robust!

@anguswg-ucsb
Copy link

I've put together an ingredient parsing python package ingredient-slicer, which will parse ingredient strings (i.e. "2 1/2 cups of tomato sauce") and do a best effort extraction of the unit, quantity, food, gram_weight, and other extraneous details (prep, size_modifiers, etc.)

I made ingredient-slicer because I needed a lightweight ingredient parser with zero dependencies and that does NOT require/rely on a NLP/models to do its thing. The package uses only Python's standard library and is pretty quick.

Its by no means perfect for extracting food perfectly from an ingredient but it does a really good job with unit and quantity and applying any extra information mentioned in parenthetical references (i.e. "2 salmon steaks (8 ounces each)" ends up with a unit of "ounces" and a quantity of "16" ---> 16 ounces = 2 * 8 ounces each) .

An example to illustrate:

pip install ingredient-slicer
import ingredient_slicer

slicer = ingredient_slicer.IngredientSlicer("2 (15-ounces) cans chickpeas, rinsed and drained")

slicer.to_json()

{   
    'ingredient': '2 (15-ounces) cans chickpeas, rinsed and drained', 
    'standardized_ingredient': '2 cans chickpeas, rinsed and drained', 
    'food': 'chickpeas', 

    # primary quantity and units
    'quantity': '30', 
    'unit': 'ounces', 
    'standardized_unit': 'ounce', 

    # any other secondary quantity and units found in the string
    'secondary_quantity': '2', 
    'secondary_unit': 'cans', 
    'standardized_secondary_unit': 'can', 

    'gram_weight': '850.49', 
    'prep': ['drained', 'rinsed'], 
    'size_modifiers': [], 
    'dimensions': [], 
    'is_required': True, 
    'parenthesis_content': ['15 ounce']
}

It fixed a problem for me so thought it might be helpful for other people too!
And thank you for everyone that contributes/maintains recipe-scrapers its a great tool you all have built/maintained, keep up the great work!

@jp-berg
Copy link

jp-berg commented Apr 21, 2024

Hey, over the past year or so I wanted to dive deeper into Python-development, so I used this project as a basis for my CLI-app recipe2txt.

This was my motivation to examine various aspects of the language and Python-project-management a little closer, so it may be unconventional in some parts, but as far as I know everything works.

Features include asynchronous fetching, jinja-templating and local caching of recipes. And (maybe the most interesting part for recipe-scrapers) it generates formatted Github-issues if any scraping-errors are encountered during the process, so that the user can easily report any errors here.

Thank you to all contributors here that made the hard part of recipe-scraping easy!

@timsamart
Copy link

Hey has anyone scraped all the available or a large amount of data and could share? I have a research project I want to launch and need as much data as possible.

@jaspervzwi
Copy link
Contributor

Hey all! I'm working on a tool that maintains a database by scraping all recipe pages from a given website. It pulls the sitemap, selects all pages with recipes and then creates a dict or json file with all metadata scraped by recipe-scraper.

Feel free to check it out at recipe-database-scraper

@mkayeterry I realise there's a bit of overlap with the repo you shared earlier this year. Hope you don't mind. One of my goals was to continue finding new recipe pages added to a website. I couldn't figure out a good way to reconcile that with your repo, so I went in a different direction.

@jaspervzwi
Copy link
Contributor

Hey has anyone scraped all the available or a large amount of data and could share? I have a research project I want to launch and need as much data as possible.

@timsamart started working on that now, but it'll take a while to go through all websites.
Did you already build that db by now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests