Skip to content

Latest commit

 

History

History
45 lines (43 loc) · 4.41 KB

README.md

File metadata and controls

45 lines (43 loc) · 4.41 KB

UpSchool-CapstoneProject

UpCrawlerApplication.mp4

Description

Crawler Bot is a C# .NET project designed to scrape product information from an e-commerce website and store it in a MySQL database. It utilizes a C# background worker to navigate the target website and extract details such as product names, prices, discount availability, and image URLs. The project follows the Clean Architecture and CQRS design patterns, ensuring a well-structured and maintainable codebase.

Features

Web scraping of product details: The Crawler Bot navigates to the e-commerce website and gathers product information, including regular and discounted prices, image URLs, and product names.

Clean Architecture and CQRS: The project is structured according to the Clean Architecture principles, ensuring separation of concerns and maintainability. The CQRS pattern is implemented to segregate read and write operations effectively.

User Management: Microsoft Identity is employed for user authentication, allowing users to register and log in through traditional methods or using Google login. JWT tokens are used for secure login/logout procedures.

Email Notifications: The application sends email notifications to users upon registration and when specific product details are scraped.

Global Exception Handling: A GlobalException filter is implemented to handle and manage exceptions gracefully throughout the application.

Front-end with React and TypeScript: The front-end of the application is built using React with TypeScript, providing a responsive and user-friendly interface.

Tailwind CSS for Styling: Tailwind CSS is used for designing the UI, ensuring a clean and modern appearance.

Back-end Technologies

C# .NET

Clean Architecture

CQRS (Command Query Responsibility Segregation)

Microsoft Identity for User Management

JWT (JSON Web Tokens) for Authentication

MySQL for Database Management

Background Worker for Web Scraping

SignalR for Real-Time Communication

Selenium Framework for bots

Front-end Technologies

React with TypeScript

Tailwind CSS for Styling

How the Crawler Bot Works

User Authentication: Users can register and log in using traditional methods or through Google login. JWT tokens are issued for secure authentication.

Homepage: Upon successful login, users are directed to the home page, where they can directly navigate to give an order.

Creating Orders: Users can create new orders, specifying details such as the number of products to scrape and the type of products (all, discounted, non-discounted).

SignalR Communication: When an order is created, the details are sent to the back-end worker using SignalR hub for web scraping.

Web Scraping: The background worker, with the help of a web driver, navigates to the e-commerce website and performs web scraping based on the user's order details.

Data Storage: The scraped product information is stored in the "Product" table, and order-related information is stored in the "Order" table. Bot status and order completion details are saved in the "Order Event" table.

Live Tracking: Users can track the bot's progress and the scraped products in real-time using the "Live Track" page. SignalR facilitates the transfer of logs from the back-end to the front-end for live updates.

Protected Routes: The application uses protected routes to ensure user session security. If a token expires, the user is automatically logged out for enhanced security.

Email Notification and Export to Excel

Users have the option to export their order details to an Excel table directly from the product page and send these crawled products via email. This feature enables users to conveniently analyze, store, and share their scraped data with ease.

Installation and Setup

Clone the repository from GitHub.

Set up the necessary environment for C# .NET and React with TypeScript.

Install the required C# and JavaScript dependencies.

Configure the MySQL database connection settings.

Build and run the C# .NET back-end.

Start the React front-end to access the Crawler Bot application.