RecEngineMF

Data

MovieLens 20M Dataset

Command to Download and Extract Data

Download MovieLens 20M Data

wget --output-document=./ml-20m.zip  https://files.grouplens.org/datasets/movielens/ml-20m.zip

Once the download is complete, extract the dataset
```
unzip ml-20m.zip
```

Usage

Install Dependencies

Install Python: Make sure Python is installed on your system. If not, you can download and install Python from the official Python website: https://www.python.org/downloads/
Create a virtual environment:
```
python -m venv myenv
```
Activate the virtual environment

For Windows CMD Users
```
.\myenv\Scripts\Activate.bat
```
For Windows Powershell Users
```
.\myenv\Scripts\Activate.ps1
```
For macOS/Linux Users
```
source myenv/bin/activate
```
Install the dependencies
```
pip install -r requirements.txt
```
Add wandb API key

Sign in to https://wandb.ai and get your API key.
Create a file secrets.json in the root directory and put your wandb API key.
```
{
	"WANDB_API_KEY": "YOUR_API_KEY"
}
```

Train

python train.py --data_path DATA_PATH [--emb_size EMB_SIZE] [--random_seed RANDOM_SEED] 
                [--batch_size BATCH_SIZE] [--epochs EPOCHS] [--learning_rate LEARNING_RATE] 
                [--weight_decay WEIGHT_DECAY] [--step_size STEP_SIZE] [--gamma GAMMA] 
                [--patience PATIENCE] [--model_name MODEL_NAME] [--metrics_csv_name METRICS_CSV_NAME]
                [--silent] [--log_wandb]

Required Flag

--data_path: Path to the CSV file containing the ratings data.

Optional Flags

--emb_size: Size of the embedding for users and items. Default is 100.
--random_seed: Random seed for reproducibility. Default is 42.
--batch_size: Batch size for training. Default is 64000.
--epochs: Number of epochs for training. Default is 100.
--learning_rate: Learning rate for optimizer. Default is 0.001.
--weight_decay: Weight decay for optimizer. Default is 1e-5.
--step_size: Step size for learning rate scheduler. Default is 10.
--gamma: Gamma value for learning rate scheduler. Default is 0.1.
--patience: Patience for early stopping based on validation loss. Default is 3.
--model_name: Name of the trained model file to be saved. Default is 'mf_model.pth'.
--metrics_csv_name: Name of the CSV file to save the training metrics. Default is 'metrics.csv'.
--silent: Whether to hide verbose output during training.
--log_wandb: Whether to log metrics into weights and bias (wandb.ai).

Test

python test.py  --data_path DATA_PATH --model_path MODEL_PATH [--batch_size BATCH_SIZE] [--random_seed RANDOM_SEED]

Required Flags

--data_path: Path to the CSV file containing the ratings data.
--model_path: Path to the trained model file to be loaded for testing.

Optional Flags

--batch_size: Batch size for testing. Default is 64000.
--random_seed: Random seed for reproducibility. Default is 42.

Run Inference

python inference.py --data_path DATA_PATH --model_path MODEL_PATH --user_id USER_ID [--n_items N_ITEMS]

Required Flags

--data_path: Path to the CSV file containing the ratings data.
--model_path: Path to the trained model file to be loaded for testing.
--user_id: The id of the user for whom item is to be recommended.

Optional Flags

--n_items: The top n number of items to be recommended to the user. Default is 10.

Plot Curve

python plot.py --metrics_csv_path METRICS_CSV_PATH [--patience PATIENCE] [--file_name FILE_NAME]

Required Flags

--metrics_csv_path: Path to the CSV file containing the mertics data. [ CSV file with column names: 'Epoch', 'Train Loss', 'Val Loss' ]

Optional Flags

--patience: Patience for early stopping. Default is None.
--file_name: The name for saving the plot. Default is loss_curve.png.

References

Matrix Factorization Techniques for Recommender Systems (Y. Koren, R. Bell and C. Volinsky, "Matrix Factorization Techniques for Recommender Systems," in Computer, vol. 42, no. 8, pp. 30-37, Aug. 2009, doi: 10.1109/MC.2009.263.)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
inference.py		inference.py
models.py		models.py
plot.py		plot.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RecEngineMF

Data

MovieLens 20M Dataset

Command to Download and Extract Data

Usage

Install Dependencies

Train

Required Flag

Optional Flags

Test

Required Flags

Optional Flags

Run Inference

Required Flags

Optional Flags

Plot Curve

Required Flags

Optional Flags

References

About

Releases

Packages

Languages

SunilGolden/RecEngineMF

Folders and files

Latest commit

History

Repository files navigation

RecEngineMF

Data

MovieLens 20M Dataset

Command to Download and Extract Data

Usage

Install Dependencies

Train

Required Flag

Optional Flags

Test

Required Flags

Optional Flags

Run Inference

Required Flags

Optional Flags

Plot Curve

Required Flags

Optional Flags

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages