-
Download MovieLens 20M Data
wget --output-document=./ml-20m.zip https://files.grouplens.org/datasets/movielens/ml-20m.zip
-
Once the download is complete, extract the dataset
unzip ml-20m.zip
-
Install Python: Make sure Python is installed on your system. If not, you can download and install Python from the official Python website: https://www.python.org/downloads/
-
Create a virtual environment:
python -m venv myenv
-
Activate the virtual environment
For Windows CMD Users
.\myenv\Scripts\Activate.bat
For Windows Powershell Users
.\myenv\Scripts\Activate.ps1
For macOS/Linux Users
source myenv/bin/activate
-
Install the dependencies
pip install -r requirements.txt
-
Add wandb API key
Sign in to https://wandb.ai and get your API key.
Create a file secrets.json in the root directory and put your wandb API key.{ "WANDB_API_KEY": "YOUR_API_KEY" }
python train.py --data_path DATA_PATH [--emb_size EMB_SIZE] [--random_seed RANDOM_SEED]
[--batch_size BATCH_SIZE] [--epochs EPOCHS] [--learning_rate LEARNING_RATE]
[--weight_decay WEIGHT_DECAY] [--step_size STEP_SIZE] [--gamma GAMMA]
[--patience PATIENCE] [--model_name MODEL_NAME] [--metrics_csv_name METRICS_CSV_NAME]
[--silent] [--log_wandb]
- --data_path: Path to the CSV file containing the ratings data.
- --emb_size: Size of the embedding for users and items. Default is 100.
- --random_seed: Random seed for reproducibility. Default is 42.
- --batch_size: Batch size for training. Default is 64000.
- --epochs: Number of epochs for training. Default is 100.
- --learning_rate: Learning rate for optimizer. Default is 0.001.
- --weight_decay: Weight decay for optimizer. Default is 1e-5.
- --step_size: Step size for learning rate scheduler. Default is 10.
- --gamma: Gamma value for learning rate scheduler. Default is 0.1.
- --patience: Patience for early stopping based on validation loss. Default is 3.
- --model_name: Name of the trained model file to be saved. Default is 'mf_model.pth'.
- --metrics_csv_name: Name of the CSV file to save the training metrics. Default is 'metrics.csv'.
- --silent: Whether to hide verbose output during training.
- --log_wandb: Whether to log metrics into weights and bias (wandb.ai).
python test.py --data_path DATA_PATH --model_path MODEL_PATH [--batch_size BATCH_SIZE] [--random_seed RANDOM_SEED]
- --data_path: Path to the CSV file containing the ratings data.
- --model_path: Path to the trained model file to be loaded for testing.
- --batch_size: Batch size for testing. Default is 64000.
- --random_seed: Random seed for reproducibility. Default is 42.
python inference.py --data_path DATA_PATH --model_path MODEL_PATH --user_id USER_ID [--n_items N_ITEMS]
- --data_path: Path to the CSV file containing the ratings data.
- --model_path: Path to the trained model file to be loaded for testing.
- --user_id: The id of the user for whom item is to be recommended.
- --n_items: The top n number of items to be recommended to the user. Default is 10.
python plot.py --metrics_csv_path METRICS_CSV_PATH [--patience PATIENCE] [--file_name FILE_NAME]
- --metrics_csv_path: Path to the CSV file containing the mertics data. [ CSV file with column names: 'Epoch', 'Train Loss', 'Val Loss' ]
- --patience: Patience for early stopping. Default is None.
- --file_name: The name for saving the plot. Default is loss_curve.png.
- Matrix Factorization Techniques for Recommender Systems (Y. Koren, R. Bell and C. Volinsky, "Matrix Factorization Techniques for Recommender Systems," in Computer, vol. 42, no. 8, pp. 30-37, Aug. 2009, doi: 10.1109/MC.2009.263.)