In this demo example, you'll train a text classification model using PyTorch Lightning, transformers, and datasets!
The full tutorial can be found in the Grid documentation here.
If you haven't already set up the Grid CLI, follow this 1 minute guide on how to install the Grid CLI.
TLDR:
pip install lightning-grid --upgrade
grid login
This example involves three steps:
- Downloading the data
- Uploading the data to Grid using Grid datastores
- Training the
train.py
script usinggrid run
For this example we use the Lightning Flash IMDB dataset.
grid datastore create --source https://pl-flash-data.s3.amazonaws.com/imdb.zip --name imdb-ds
When the datastore upload is complete, check the status of the datastore with grid datastore list
.
Wait until Status of datastore shows as Succeeded before moving to the next step.
Training Parameters
Here are the parameters we'll specify to grid run
:
Grid flags:
- --instance_type: defines number of GPUs and memory
- --gpus: the number of GPUs per experiment
- --datastore_name: the name of the datastore (created above) that you'd like to attach to this training run
- --datastore_version: the version of the datatstore to attach to this training run (defaults to 1)
- --grid_disk_size: the disk size in GB to allocate to each node in the cluster
Then we'll specify the script we're using to train our model followed by the script arguments.
Script: src/train.py
These are the arguments defined by the train.py
script:
Script arguments:
- train_file
- valid_file
- test_file
- max_epochs
Cool! Now we can spin up a Grid run.
Submit the command below to train a run on a single GPU:
grid run \
--name imdb-demo \
--gpus 1 \
--instance_type p3.2xlarge \
--datastore_name imdb-ds \
--disk_size 500 \
train.py \
--gpus 1 \
--train_file /datastores/imdb-ds/train.csv \
--valid_file /datastores/imdb-ds/valid.csv \
--test_file /datastores/imdb-ds/test.csv \
--max_epochs 1
You can use the grid status
command to check on the status of the run. To view progess in the Grid UI, use grid view
.