Skip to content

LucasWz/py-issuu-scrape

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Issuu : from reader Iframe to PDF

Objectif

Issuu platform share a lot of documents but some of them are not dowloadable. The problem for me is that I like to keep track of my readings and join notes with them. This package a only one usage : download some or all pages from the site and merge it into a nice pdf format.

Basic usage

  1. Run : python3 ./main.py;
  2. Input URL;
  3. Input the page count you want ;

Features

  • Loading pages as jpg file in ./tempfolder;
  • Metada saved in json format in ./outfolder;
  • URL parsed saved in txt format in ./out folder;
  • Some logging for debugging ;
  • Progress bar ;

Requirements

Python

This script requires Python 3 and BeautifulSoup. To install the required packages:

Conda users (For the exact configuration I uses)

conda env create -f ENV.yml
conda activate scrape_issuu

Pip users

pip3 install bs4

ImageMagick

This package also requires the convert command from ImageMagick

Known issues

Credits

This package is mainly a refactorings of https://github.com/dkl3/py-issuu-scrape . Thanks dude :).

dkl3 was inspired by the Ruby script from pietrop: https://github.com/pietrop/issuu.com-downloader as well as dkl3's original python script: https://github.com/dkl3/py-issuu-scrape

About

Issuu scraper written in Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%