Python Web Crawler

A web crawler written in Python to crawl a given website.

Features!

Faster
Ablility to specify the number of threads to use to crawl the given website
Ability to use proxies to bypass IP restrictions
Clear summary of all the urls that were crawled. View the crawled.txt file to see the complete list of all the links crawled
Ability to specify delay between each HTTP Request
Stop and resume crawler whenever you need
Gather all the urls with their titles to a csv, incase if you are planning to create a search engine
Search for specific text throughout the website
Clear statistics about how many links ended up as Files,Timeout Errors,Connecrion Errors
Crawl until you need. You can specify upto what level the crawler should crawl.
Random browser user agents will be used while crawling.

Upcoming Features!

Gather AWS Buckets,Emails,Phone Numbers etc
Download all images

Dependencies

This tool uses a number of open source projects to work properly:

BeautifulSoup - Parser to parse the HTML response of each request made.
Requests - To make GET requests to the URLs.

Usage

If you like to see the list of supported features, simply run

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
.travis.yml		.travis.yml
README.md		README.md
colorify.py		colorify.py
link_finder.py		link_finder.py
main.py		main.py
requester.py		requester.py
requirements.txt		requirements.txt
spider.py		spider.py
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Web Crawler

Features!

Upcoming Features!

Dependencies

Usage

Specifying only to crawl for 3 levels

Search for specific text throughout the website

Gather all the links along with their titles to a CSV file. A CSV file with the links and their titles will be created after the crawl completes

Use proxies to crawl the site.

About

Releases

Packages

Languages

eckarthik/PyWebCrawler

Folders and files

Latest commit

History

Repository files navigation

Python Web Crawler

Features!

Upcoming Features!

Dependencies

Usage

Specifying only to crawl for 3 levels

Search for specific text throughout the website

Gather all the links along with their titles to a CSV file. A CSV file with the links and their titles will be created after the crawl completes

Use proxies to crawl the site.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages