PageScraper

An Elixir-based page scraper app built on Phoenix. It detects changes on a given web page and logs them to a database. It uses Selenium and the Hound package to load pages in a Chrome session. Currently, the app uses a single Chrome session, which has an impact on the polling speed when more than 1 pages are being polled at the same time.

Installation

Please, use Docker to use the app.

Run the below setup command to build the containers, create a new database and run the migrations. Please note, the command drops any existing database.

$ ./setup.sh

Start the app in development mode:

$ ./start.sh

Finally, load http://localhost in your browser.

Running the tests

$ ./test.sh

Details

Create a .env file in the app's root directory to use the below options.

To specify a Timezone, add the following environment variable to the file:

TZ=your_time_zone         Default: Europe/London

To specify the limit of logged changes per page, add the following environment variable to the file:

PAGE_CHANGES_LIMIT=100    Default: 100

To-do list

Add a Healthcheck to the page_scraper_selenium_chrome docker container
Display live workers' status using channels/WebSockets
Improve and finish off tests
Improve frontend/design
Implement a way to get page status before pulling page source
Implement pagination
Implement multiple Chrome sessions

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
config		config
lib		lib
priv		priv
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
mix.exs		mix.exs
mix.lock		mix.lock
setup.sh		setup.sh
start.sh		start.sh
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PageScraper

Installation

Running the tests

Details

To-do list

About

Releases

Packages

Languages

evgeniradev/page_scraper

Folders and files

Latest commit

History

Repository files navigation

PageScraper

Installation

Running the tests

Details

To-do list

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages