Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New request: OEIS (The On-Line Encyclopedia of Integer Sequences) - Database #1107

Open
tdeitch opened this issue Jul 8, 2024 · 10 comments
Open
Labels
Scraper Needed We need to build a dedicated scraper for this website

Comments

@tdeitch
Copy link

tdeitch commented Jul 8, 2024

edit by @benoit74: mention it is not a mediawiki and clarify title / description to make a distinction with ZIM from #1189

@RavanJAltaie
Copy link
Contributor

RavanJAltaie commented Jul 9, 2024

Recipe created
https://farm.openzim.org/recipes/oeis.org_en_all
I'll update the library link once ready

@RavanJAltaie
Copy link
Contributor

The recipe was taking 7 days to scrape 11%, I've stopped it an re-run it again.

@benoit74
Copy link
Contributor

There is a mediawiki at https://oeis.org/wiki/

We cannot scrape mediawiki websites with zimit unless very special configuration is put in place (and this is not even recommended).

There is 49 languages supported, I don't get why we create only one ZIM.

The website seems to be centered around a search boxes to search for integer sequences. These search functionalities are not going to work inside the ZIM, they need an online server. Are we sure the ZIM will still be usable without this search box? (I doubt it will, at least not as-is with zimit, or we need to find proper home page).

Some "sub-sites" like https://oeis.org/play and https://oeis.org/plot2.html (and maybe others, I did not investigated all of them) are not going to work, they need a server to generate the audio file based on user input.

To move forward, we need to more precisely define:

  • what do we want inside the ZIM?
  • do we create one ZIM per language as usual?
  • what is the strategy for the wiki? we create a separate ZIM? we try to tune zimit config to scrape the mediawiki because it is going to be too cumbersome for users to have two ZIMs, one with the integer sequences and one with the wiki?

@benoit74
Copy link
Contributor

Note: task e130bc44-0dfc-4901-ba92-1cf894731d05 is marked as succeeded, but in fact the crawler crashed with "Browser disconnected (crashed?), interrupting crawl" message, the ZIM is not usable.

@RavanJAltaie
Copy link
Contributor

What about changing the the website to: https://oeis.org/wiki/Welcome ?
@benoit74 I think this it the target in the request?

@benoit74
Copy link
Contributor

I don't know

@RavanJAltaie RavanJAltaie added Mediawiki For zim requests that are mediawiki-related (scrapper technology exists already) labels Sep 19, 2024
@benoit74
Copy link
Contributor

I've disabled the recipe which was still running but was wrong and not working.

@tdeitch can you please explain what is interesting you to ZIM on the website? Is it the database of integer sequences, or the wiki explaining things, or anything else? Or both?

@tdeitch
Copy link
Author

tdeitch commented Oct 14, 2024

Sorry, I should have been clearer and not said that it was a MediaWiki. The database of integer sequences is what's interesting to me, and I expect to most people. The wiki is useful for people looking to contribute, but not something I care about day to day.

Currently, I can look up terms in a local clone of https://github.com/oeis/oeisdata, I just thought it'd be cool to have as a ZIM file so I could search it alongside all of my other offline references.

@benoit74
Copy link
Contributor

@tdeitch no worries, and thanks a lot for the clarification. I expected your answer but didn't wanted to bias the request based on my own biases ^^

Due to the very dynamic nature of oeis website, we cannot scrape it with our general purpose zimit scraper.

It means that we have to create a custom scraper for put the database in a ZIM, and develop a web UI capable to interact with the in-ZIM database (and query it with JS). Not something infeasible and probably not extremely hard, but definitely going to take some time, so without funding or a volunteer contributor committing to this, we might never see this happen, at least not very soon.

Thanks a lot for the link to https://github.com/oeis/oeisdata, this is very important to know this database exists standalone, it is a great enabler for this custom scraper.

And anyway, I'm very supportive of creating this scraper and I find it would be cool as well. I just lack time / have more important or funded topics to handle ^^

So thank again for proposing the idea!

Since the wiki might be interesting as well, I will create a secondary issue focusing only on the wiki and modify your first comment here to make it clear that what we want is the database.

@benoit74 benoit74 added Scraper Needed We need to build a dedicated scraper for this website and removed Mediawiki For zim requests that are mediawiki-related (scrapper technology exists already) labels Oct 15, 2024
@benoit74
Copy link
Contributor

Nota: I've deleted https://farm.openzim.org/recipes/oeis.org_en_all since it made no sense

@benoit74 benoit74 changed the title New request: OEIS (The On-Line Encyclopedia of Integer Sequences) New request: OEIS (The On-Line Encyclopedia of Integer Sequences) - Database Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scraper Needed We need to build a dedicated scraper for this website
Projects
None yet
Development

No branches or pull requests

3 participants