This scraper creates offline versions in ZIM format of PhET science simulations for Science and Math.
It requires Node.js version 16 or higher.
npm i && npm start
The above will eventually output a ZIM file to dist/
--withoutLanguageVariants
uses to exclude languages with Country variant. For example en_CA
will not be present in zim with this argument.
Available only on GET step:
--withoutLanguageVariants ...
Available on GET and EXPORT steps only:
--includeLanguages lang_1 [lang_2] [lang_3] ...
--excludeLanguages lang_1 [lang_2] [lang_3] ...
Available on EXPORT step only:
# Skip ZIM files for individual languages
--mulOnly
Example:
npm run get -- --includeLanguages en ru fr
Another way to configure behaviour is through environment variables. Sample .env
file (with default values):
# request per second, affects GET step only
PHET_RPS=8
# async workers on TRANSFORM step (keep it equal to number of CPU cores)
PHET_WORKERS=10
# number of retries on GET step (delay grow with exponential backoff)
PHET_RETRIES=5
# display verbose errors
PHET_VERBOSE_ERRORS=false
This project achieves multiple things:
- Download PhET content
- Generate an Index for said content
- Generate ZIM file(s) containing content and index
Things this project does not yet do, but should:
- Generate Android APK
The functionality is split into 5 npm scripts
:
npm run setup
- deletes state from previous runsnpm run get
- downloads PhET simulations in specified languagesnpm run transform
- prepare the content and media filesnpm run export
- generates ZIM file(s)npm start
- runs all of the above in sequence
The steps get, transform and export have their own output directories:
get
outputs HTML and PNG files tostate/get
transform
outputs intermediate files tostate/transform
export
outputs HTML and PNG files tostate/export
AND a ZIM file(s) todist/