Skip to content

Frequently Asked Questions

benoit74 edited this page Apr 12, 2024 · 13 revisions

Command line options

Can all MediaWiki instances be scraped?

MWoffliner can not scrape any online MediaWiki instance.

Here are the prerequisites:

  • MediaWiki version must be 1.17 or higher
  • MediaWiki API must be activated and one of following end-points must be activated
  • Mediawiki instance must be stable and able to provide proper responses for all articles requested

Which value for --mwUrl?

--mwUrl value is the MediaWiki base URL. It should be considered like an URL prefix on which the URL paths (for example --mwWikiPath value) will be appended. Usually the --mwUrl URL is only composed from the protocol scheme and the domain name (for example https://en.wikipedia.org), but if the whole MediaWiki is not available at the root of the host, then you might have to add a path. You can observe the Mediawiki base URL just by loading the main page of the remote MediaWiki instance, but it's also given on the Special:Version page, here for example on Wikipedia in English.

Which value for --mwWikiPath?

--mwWikiPath value is the MediaWiki wiki base URL path. This is the Web browser visible path configured to access any article; the article ID being appended directly after. Usually this is just /wiki/. You can also put there the index.php end-point path. For example, for Wikipedia in English, you can indifferently configure /wiki/ or /w/index.php. You can observe the Mediawiki base URL just by loading the main page of the remote MediaWiki instance, but it's also given on the Special:Version page, here for example on Wikipedia in English.

Which value for --mwActionApiPath?

--mwActionApiPath value is the MediaWiki "tradition" API path. Usually the path value here is very similar to the one of --mwModulePath as api.php is positioned just beside load.php. You can find it by loading the Special:Version page. For example for Wikipedia in English, this is /w/api.php and you can see it here.

Which value for --mwModulePath?

--mwModulePath value is the MediaWiki module load path. Usually the path value here is very similar to the one of --mwActionApiPath as load.php is positioned just beside api.php. You can find it by loading the Special:Version page. For example for Wikipedia in English, this is /w/load.php and you can see it here.

Which value for --mwRestApiPath?

--mwRestApiPath value is the MediaWiki REST API URL path for RestApi (desktop) HTML renderer. You can find it by loading the Special:Version page to get the rest.php. For example for Wikipedia in English, this is /w/rest.php and you can see it here.

What is the option --forceRender?

To retrieve HTML pages from a remote MediaWiki instance, MWoffliner deals with Mediawiki APIs. MediaWiki provides multiples ways to retrieve HTML pages, but depending of the version of MediaWiki and the way it is setup, many of them might be unavailable. Per default, MWoffliner will do it's best to pick the right API: priority given on modern & mobile friendly API end-points (see https://github.com/openzim/mwoffliner/wiki/API-end%E2%80%90points). If you want to force the usage of a specific one, then use the option --forceRender.