Create a face detection model from scratch on Grid. This project covers:
- creating a dataset from sratch
- creating a Grid AI Datastore
- training a model on Grid AI
- using your trained model for inference
We will be creating a dataset with the following folder structure in order to use this project:
dataset
├── test
│ ├── face_a
│ │ ├── 528bca97-9676-4d4b-866e-67ace4469ffc_output.jpg
│ │ └── e0facb84-beee-4210-a09f-9fd60dd0bc6a_output.jpg
│ └── face_b
│ ├── 0164e6da-f2d3-423c-a3fc-e3469993fb7b_output.jpg
│ └── IMG_8021_output.jpg
├── test
│ ├── face_a
│ │ └── e0facb84-beee-4210-a09f-9fd60dd0bc6a_output.jpg
│ ├── face_b
│ │ ├── 0164e6da-f2d3-423c-a3fc-e3469993fb7b_output.jpg
│ │ └── IMG_8021_output.jpg
└── val
├── face_a
│ └── e0facb84-beee-4210-a09f-9fd60dd0bc6a_output.jpg
└── face_b
├── 0164e6da-f2d3-423c-a3fc-e3469993fb7b_output.jpg
└── IMG_8021_output.jpg
Each face you want to detect corresponds to a directory name. And each root directory corresponds to a
different split of the dataset, i.e. train
, test
, and val
.
Place all your image files in the the raw
directory. Don't organize them into folders yet, just place them in the root
of that directory.
We will not be running those images through a face-detection model called MTCNN.
We'll then be cropping all the detected faces into their own files and storing the output in the ./processed
directory.
Let's do that with:
$ python process_raw_data.py
You will find a number of really small images with only faces. You now need to "annotate" the resulting images by doing two things:
- removing any images that aren't faces (MTCNN makes mistakes sometimes)
- placing each image in a directory with the same of the person whose face belongs to, for example:
# `vera` and `luis` are the two people in this dataset
processed/vera/photo_1.jpg
processed/luis/photo_1.jpg
We will now be splitting your "annotated" dataset into three collections: train
, test
, and val
. You can
do that by running the script: create_training_dataset.py
$ python create_training_dataset.py
You will now find a new directory called ./dataset
. This directory contains the training dataset you need.
This script also generates a file that maps label indices to class names. That file is called dataset/labels.json
and has the following format:
{"labels": ["label_a", "label_b"]}
We'll be using that file later when using the trained model for predictions.
Grid AI introduces the concept of Datastores. Datastores are high performance volumes mounted into your training context when using Grid. That means that you can create a Datastore once and then use it to create both Sessions or Runs on Grid.
Make sure to install the Grid CLI and login, then:
$ grid datastores create --name face_detection --source dataset/
upload ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0%
✔ Finished uploading datastore.
You can then verify that your datastore is ready to use by checking its status:
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Credential Id ┃ Name ┃ Version ┃ Size ┃ Created ┃ Status ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ cc-bwhth │ face_detection │ 1 │ 2.1 MB │ 2021-05-13 19:55 │ Succeeded │
└───────────────┴────────────────┴─────────┴──────────┴──────────────────┴───────────┘
Whenever your datastore has status Succeeded
it is ready to use.
You can train your model by calling the train.py
script locally (make sure to install your dependencies first):
$ python3.8 -m venv venv && source venv/bin/activate && pip install -r requirements.txt
$ python train.py
Global seed set to 1234
train samples: 341
valid samples: 42
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
| Name | Type | Params
----------------------------------------
0 | metrics | ModuleDict | 0
1 | backbone | Sequential | 11.2 M
2 | head | Sequential | 1.5 K
----------------------------------------
11.2 M Trainable params
0 Non-trainable params
11.2 M Total params
44.712 Total estimated model params size (MB)
Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]
Feel free to run that locally to test that your model works as expected.
You are now ready to train your model on Grid. We'll be using the CLI but you can do the same thing by using the
web UI. We have placed a configuration file locally (.grid/config.yml
) that you can use as reference instead of
passing all the parameters to the CLI manually -- or just click on Grid badge:
$ grid run --grid_instance_type g4dn.xlarge \
--grid_gpus 1 \
--grid_datastore_name face_detection \
--grid_datastore_version 1 \
--grid_datastore_mount_dir /gridai/project/dataset \
train.py --max_epochs 1000 --data_path /gridai/project/dataset
No --grid_name passed, naming your run glossy-manatee-255
Using default cloud credentials cc-bwhth to run on AWS.
Run submitted!
`grid status` to list all runs
`grid status glossy-manatee-255` to see all experiments for this run
----------------------
Submission summary
----------------------
script: train.py
instance_type: g4dn.xlarge
distributed: False
use_spot: True
cloud_provider: aws
cloud_credentials: cc-bwhth
grid_name: glossy-manatee-255
datastore_name: face_detection
datastore_version: 1
datastore_mount_dir: /gridai/project/dataset
Grid AI makes it trivial to run a hyperparameter sweep
without having to change anything in your scripts. The model we created provides support for a number of different backbones,
including resnet18
and resnet200d
. Let's try both different models and learning rates to make sure we find the best model:
$ grid run --grid_instance_type g4dn.xlarge \
--grid_gpus 1 \
--grid_datastore_name face_detection \
--grid_datastore_version 1 \
--grid_datastore_mount_dir /gridai/project/dataset \
train.py --max_epochs 1000 --data_path /gridai/project/dataset \
--learning_rate "uniform(0,0.0001,2)" --backbone "['resnet18','resnet200d']"
That will generate 4 experiments combining both different backbones and learning rate combinations.
This section covers how to get your weights from Grid and make predictions with your model.
Let's download your latest weights from Grid and run a series of predictions with your trained model.
We'll first download all artifacts from your run with grid artifacts
. In this case my Run was called
glossy-manatee-255
. When I run grid artifacts glossy-manatee-255
it downloads all the artifacts for
the Experiments from that Run.
$ grid artifacts glossy-manatee-255
Downloading artifacts → ('glossy-manatee-255',)
glossy-manatee-255 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00
Artifacts are saved by default in the grid_artifacts
directory:
$ tree grid_artifacts
grid_artifacts
└── glossy-manatee-255
└── glossy-manatee-255-exp0
└── version_0
├── checkpoints
│ └── epoch=712-step=7129.ckpt
├── events.out.tfevents.1620938447.exp-glossy-manatee-255-exp0.20.0
└── hparams.yaml
4 directories, 3 files
The file we are looking for is epoch=712-step=7129.ckpt
which is the latest PyTorch checkpoint file.
Now that we have our weights locally we want to load them using Lightning Flash and make predictions. You can run the script predict.py
to test your new trained model:
$ python predict.py --checkpoint_path grid_artifacts/glossy-manatee-255/glossy-manatee-255-exp0/version_0/checkpoints/epoch=712-step=7129.ckpt \
--image_path test_prediction_image.jpg
Predicted class: person_a