Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added pipeline for phishing website detection #669

Merged
merged 2 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,101 changes: 2,101 additions & 0 deletions Prediction Models/phishing_detection/Phising_Testing_Dataset.csv

Large diffs are not rendered by default.

8,956 changes: 8,956 additions & 0 deletions Prediction Models/phishing_detection/Phising_Training_Dataset.csv

Large diffs are not rendered by default.

108 changes: 108 additions & 0 deletions Prediction Models/phishing_detection/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Phishing URL Detector

A machine learning-based system that detects potential phishing URLs by analyzing various URL features. The system uses a Voting Classifier model trained on multiple URL characteristics to determine if a URL is legitimate or potentially malicious.

## Features

- Real-time URL analysis
- 30 different feature extractions including:
- Domain-based features
- URL-based features
- HTML and JavaScript features
- Domain age and registration features
- User-friendly command-line interface
- Clear warning system for potentially malicious URLs
- Continuous checking capability

## Prerequisites

```bash
pip install joblib numpy requests beautifulsoup4 whois python-whois dnspython tldextract
```

## Installation

1. Clone the repository:

```bash
git clone https://github.com/yourusername/phishing-detector.git
cd phishing-detector
```

2. Install required packages:

```bash
pip install -r requirements.txt
```

3. Ensure all files are in the same directory:
- `voting_classifier_model.pkl` (trained model)
- `features_extraction.py` (feature extraction code)
- `predict.py` (prediction script)

## Usage

1. Run the prediction script:

```bash
python predict.py
```

2. Enter the URL you want to check when prompted:

```
Enter URL to check (or 'quit' to exit): https://example.com
```

3. The system will analyze the URL and provide one of two responses:

- ✅ This URL appears to be LEGITIMATE
- ⚠️ Warning: This URL is potentially PHISHING

4. You can choose to check another URL or exit the program.

## How It Works

1. **Feature Extraction**: The system extracts 30 different features from the provided URL using the `features_extraction.py` module, including:

- IP address presence
- URL length
- Shortening service usage
- '@' symbol presence
- SSL certificate status
- Domain age
- HTML/JavaScript features
- And many more...

2. **Prediction**: The extracted features are processed by a trained Voting Classifier model that combines multiple machine learning algorithms to make a final prediction.

## Error Handling

The system includes robust error handling for:

- Invalid URLs
- Network connection issues
- Feature extraction failures
- Model loading problems

## Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Disclaimer

This tool is for educational and research purposes only. While it can help identify potential phishing URLs, it should not be the sole factor in determining a URL's legitimacy. Always exercise caution when visiting unknown websites.

## Acknowledgments

- Feature extraction methods based on common phishing detection techniques
- Model trained on [dataset reference]
- Thanks to all contributors who participated in this project
Binary file not shown.
Loading
Loading