Skip to content

Latest commit

 

History

History
77 lines (51 loc) · 5.62 KB

README.md

File metadata and controls

77 lines (51 loc) · 5.62 KB

Uscrapper Vanta


project-image


Introducing Uscrapper Vanta, Unleashing the Power of Open-Source Intelligence, dive deeper into the vast web with Uscrapper Vanta, Vanta unlocks a new level of data extraction capabilities, empowering the exploration of the uncharted territories of the dark web and uncovering hidden gems with pinpoint accuracy using the keyword extraction model. Uscrapper Vanta retains the core strengths of its predecessor, It can be used to Harvest a wealth of personal information, including email addresses, social media links, author names, geolocations, phone numbers, and usernames, from both hyperlinked and non-hyperlinked sources. Leveraging multithreading and sophisticated anti-web scraping defenses with advanced modules, ensuring you can access the data you require, Vanta supports 'crawl and scrape' within the same domain, gathering information from every relevant corner of a website, Generates comprehensive reports to organize and analyze the extracted data, turning raw information into actionable insights.



shieldsshieldsshieldsshieldsshieldsshields



project-logo


🤩 Whats New?:

Uscrapper Vanta:

  • Dark Web Support: Uscrapper Vanta now has the capability to handle .onion or dark web links. This expanded functionality enables users to extract crucial information from previously inaccessible sources, providing a more comprehensive view of the digital landscape.

  • Keyword-Based Scraping: With the introduction of a new model, Uscrapper Vanta now allows users to scrape web pages for specific keywords or a list of keywords. This tailored approach enhances the tool's versatility, enabling users to focus on extracting only the information relevant to their needs.

💡 Extracted Details:


Uscrapper extracts the following details from the provided website:

  • Email Addresses: Displays email addresses found on the website.
  • Social Media Links: Displays links to various social media platforms found on the website.
  • Author Names: Displays the names of authors associated with the website.
  • Geolocations: Displays geolocation information associated with the website.
  • Non-Hyperlinked Details: Displays non-hyperlinked details found on the website including email addresses phone numbers and usernames.
  • Keyword Based Extraction: Displays relevant data by specifying terms or curating comprehensive keyword lists.


📽 Preview:


project-ss


project-ss2

🛠️ Installation Steps:


git clone https://github.com/z0m31en7/Uscrapper.git
cd Uscrapper/install/ 
chmod +x ./install.sh && ./install.sh      #For Unix/Linux systems


🔮 Usage:

To run Uscrapper-vanta, use the following command-line syntax:

python Uscrapper-vanta.py [-h] [-u URL] [-O] [-ns] [-c CRAWL] [-t THREADS] [-k KEYWORDS [KEYWORDS ...]] [-f FILE]


Arguments:

  • -u URL, --url URL (URL of the website)
  • -O, --generate-report (Generate a report)
  • -ns, --nonstrict (Display non-strict usernames (may show inaccurate results))
  • -c CRAWL, --crawl (CRAWL) specify max number of links to Crawl and scrape within the same scope
  • -t THREADS, --threads THREADS (Number of threads to utilize while crawling (default=4))
  • -k KEYWORDS [KEYWORDS ...], --keywords KEYWORDS [KEYWORDS ...] (Keywords to search for (as space-separated arguments)
  • -f FILE, --file FILE (Path to a text file containing keywords)


📜 Note:

  • Uscrapper relies on web scraping techniques to extract information from websites. Make sure to use it responsibly and in compliance with the website's terms of service and applicable laws.

  • The accuracy and completeness of the extracted details depend on the structure and content of the website being analyzed.

  • To bypass some Anti-Webscrapping methods we have used selenium which can make the overall process slower.


💌 Contribution:


Want a new feature to be added?

  • Make a pull request with all the necessary details and it will be merged after a review.
  • You can contribute by making the regular expressions more efficient and accurate, or by suggesting some more features that can be added.

🛡️ License:


This project is licensed under the MIT-LICENSE