Welcome to the Pitch.AI AI Challenge! This project is designed to evaluate your ability to work with real-world datasets, focusing on text preprocessing, feature extraction, and classification algorithms.
The dataset you'll be working with is the UCI Spambase dataset, which is a labeled collection of email messages used to train algorithms to distinguish between legitimate messages and spam.
- Fork this repository to your GitHub account.
- Complete the missing parts of
main_code.py
by implementing the functions for data preprocessing, feature extraction, model training, and evaluation. - Ensure your code passes the tests in
test_code.py
by running the test script. - Create a pull request (PR) back to this repository once you're done.
- Your GitHub email must match the email you submitted your application with (if your GitHub email is different, we recommend creating a new GitHub profile with the email you used in your application, preferably your UWaterloo email).
- Set your email to public on your GitHub profile.
- Do not apply any labels on your PR. We will mark your PR as reviewed with a label when it has been reviewed. If you mark this yourself, your PR will be skipped.
You are provided with a partially implemented Python script main_code.py
. Your task is to complete the functions that handle data preprocessing, feature extraction, model training, and evaluation. The goal is to develop a spam detection model using classification algorithms.
- Implement the missing functions in
main_code.py
. - Use scikit-learn or similar libraries to build your models.
- Run
test_code.py
to ensure your code works as expected.
The dataset is provided in the data/spambase.csv
file. It contains features extracted from email messages, labeled as spam (1) or not spam (0).
Dataset Link
- Fork this repository to your own GitHub account.
- Complete the
main_code.py
with your implementation. - Run the
test_code.py
script to ensure that your implementation works correctly. - Push your changes to your forked repository.
- Create a pull request back to this repository.
Make sure you follow the guidelines to avoid any issues with your submission. We look forward to reviewing your work!
Good luck!