Issuu platform share a lot of documents but some of them are not dowloadable. The problem for me is that I like to keep track of my readings and join notes with them. This package a only one usage : download some or all pages from the site and merge it into a nice pdf format.
- Run :
python3 ./main.py
; - Input
URL
; - Input the
page count
you want ;
- Loading pages as
jpg
file in./temp
folder; - Metada saved in json format in
./out
folder; - URL parsed saved in txt format in
./out
folder; - Some logging for debugging ;
- Progress bar ;
This script requires Python 3 and BeautifulSoup. To install the required packages:
conda env create -f ENV.yml
conda activate scrape_issuu
pip3 install bs4
This package also requires the convert
command from ImageMagick
- memory issues, go see this github thread;
- authorization issue, go see this askubuntu thread
This package is mainly a refactorings of https://github.com/dkl3/py-issuu-scrape . Thanks dude :).
dkl3 was inspired by the Ruby script from pietrop: https://github.com/pietrop/issuu.com-downloader as well as dkl3's original python script: https://github.com/dkl3/py-issuu-scrape