This Website Mapping tool is a Python-based web scraping application that uses scrapy to obtain source and target urls for a given website.
Please ensure you have Python installed on your system. The version used for this application is 3.11.7. If using a mac and homebrew use the command: brew install python
-
Check Pip is installed: To check if Pip is installed, open your terminal and type:
pip --version
. -
Install Scrapy: Install it using Pip by running the following command in your terminal:
pip install scrapy
-
Clone the Repository: Clone this repository to your local machine using the command:
git clone https://github.com/SamB-CCS/website_mapping_tool
-
Navigate to the Project Directory: Change your current directory to the cloned respository and navigate to the
spiders
directory using the command:cd web_crawler/web_crawler/spiders
-
Change the default source url: Amend the
start_urls
to your source url and amend theallowed_domains
too if using the mapping to in thecrawler.py
file to your target domain. To use the sitemap tool amend thestart_urls
in thesitemap.py
file -
Use the mapping tool: To use this tool run the command:
scrapy crawl mycrawler -o output.json
-
Use the sitemap generator tool: To use this tool run the command:
scrapy crawl sitemap_spider -o sitemap.xml