- Create a personal access token
- Create a new secret and name it
PERSONAL_GITHUB_TOKEN
You need to grant GitHub access to the DVC Remote.
Get the credentials.
cat ".dvc/tmp/gdrive-user-credentials.json"
And create a new GitHub secret called GDRIVE_CREDENTIALS_DATA
to store them.
With this, GitHub runners will be able to pull and push all the changes generated by the pipeline.
You can create a new GitHub actions workflow that runs when a new Pull Request is created.
This workflow will use DVC
to reproduce the pipeline and update the large artifacts tracked by DVC.
In addition it will use CML
to post a report with the DVC
metrics, params, and plots (cml send-comment). It will also update the artifacts tracked by Git (cml pr)
Create and fill `.github/workflows/on_pr.yml`
https://github.com/iterative/workshop-uncool-mlops-solution/blob/main/.github/workflows/on_pr.yaml
And now you can reproduce the pipeline from the web:
-
Edit
params.yaml
from the GitHub Interface. -
Change
train.epochs
. -
Select
Create a new branch for this commit and start a pull request
- Go to https://studio.iterative.ai (It's free)
- Connect your GitHub account.
- Add a new view.
More info: https://dvc.org/doc/studio
- Click on
Run new experiment
button.
In the above workflow we are using the default GitHub runners to train our model.
While this is enough for our use case (small dataset, small model), your project would often require more compute resources.
CML Self-Hosted Runners allows you to allocate cloud instances (or on-premise machines) and use them in your GitHub actions workflow.