Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify warehouse list #958

Open
Popolechien opened this issue May 3, 2024 · 4 comments
Open

Simplify warehouse list #958

Popolechien opened this issue May 3, 2024 · 4 comments
Assignees

Comments

@Popolechien
Copy link
Contributor

The current list of warehouse paths _sort of goes along the lines of the various scrapers we are using, but not really and this gets confusing particularly as it seems to give prominence to content for no obvious reason (e.g. vikidia, psiram). I also understand that besides a modicum of ordering available file this system may be used by various mirrors to pick and chose which content they want to mirror (in practice however only the Wikimedia Foundation restricts its mirroring to Wikimedia-related content).
image

I suggest simplifying the list of warehouses to be more congruent with our scrapers, ie:

freecodecamp
gutenberg
ifixit
nautilus
openedx
phet
stackexchange
youtube (incl. ted)
wikihow
wikimedia
other wikis

(not discussing the /.hidden folders that have their own, clearly-defined purpose)

The naming is not 100% ideal as we need to force a distinction between WMF and non-WMF wikis but other than that it seems a move in the right direction.

@rgaudin
Copy link
Member

rgaudin commented May 3, 2024

OK, just so we're clear it's going to be a difficult task because mirrors uses rsync and there is no such thing as renaming there. So if it's not properly coordinated (and we're talking about 12 different people) it could result in incredible transfers: deleting everything and re-downloading everything for instance.

@benoit74
Copy link
Collaborator

benoit74 commented May 6, 2024

Could someone push a documentation or an explanation on what is the intent of these warehouse paths, so that we are all on the same page on this question before making any decision?

@rgaudin
Copy link
Member

rgaudin commented May 6, 2024

A number of users including ourselves have always been using it to find and download ZIM files.

It used to be this or the wiki. Now all readers (but kiwix-serve) have an included downloader and we have library.kiwix.org that offers download as well.

I personally use it exclusively but have never been attached to the folders.

@kelson42
Copy link
Contributor

kelson42 commented May 6, 2024

What are the warehouse folders is arbitrary and to a large extend should not be that important (for end users). The problem here is that it is "confusing" for Zimfarm editors, and this is IMHO primarely a UI problem.

We could choose almost automaticaly where to store the ZIM files based on the scraper and by choosing the "collection".

The "collection" means basically: in which library the produced ZIM should appear. For now we have formally only one collection. But once this will properly handled in CMS we will have many of them.

Still a bit unsure about how the separation of duties should exactly look like between the Zimfarm and the CMS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants