Machine Learning REPA Week 2021 conference
.
├── data
│ ├── processed <- processed data
│ └── raw <- original unmodified/raw data
├── models <- folder for ML models
├── notebooks <- Jupyter Notebokos (ingored by Git)
├── reports <- folder for experiment reports
├── src <- source code for modules & pipelines
└── README.md
Create virtual environment named dvc-venv
(you may use other name)
python3 -m venv dvc-venv
echo "export PYTHONPATH=$PWD" >> dvc-venv/bin/activate
source dvc-venv/bin/activate
Install python libraries
pip install -r requirements.txt
Add Virtual Environment to Jupyter Notebook
python -m ipykernel install --user --name=dvc-venv
Configure ToC for jupyter notebook (optional)
jupyter contrib nbextension install --user
jupyter nbextension enable toc2/main
Jupyter Notebooks in notebooks/
directory are for example only.
To remove them (recommended) from git
version control run:
1 - Add the following string to .gitignore
notebook/*
git add .gitignore
git commit -m "Update .gitignore: add notebooks/* "
2 - Remove notebooks from the Git index and commit changes
git rm --cached notebooks/*
git commit -m "Unstage notebooks"
Note: this will remove files from the Git index only! Files won’t be deleted from the disk
IMPORTANT:
- If you
Remove notebooks from the Git index and commit changes
(see above), do any changes in notebooks and switch back tostep-1
/step-7
branches, all changes will be lost - It's not recommended to version your Jupyter Notebooks at all
- We would recommend to treat Jupyter Notebooks as artifacts for your experiments
3 - Run Jupyter Notebooks
jupyter notebook
- run all in Jupyter Notebooks
- a separate section for each logical stage
- one base section with common configs (random_state)
- human readable format (.yaml)
- i.e. main funcitons and classes
Add a pipeline stages code to src/pipelines
featurize.py - create new features
split_train_test.py - split source dataset into train/test
train.p - train classifier
evaluate.py - evaluate model and create metrics file
- add pipelines dependencies under DVC control
- add models/data/configs under DVC control