FaceNet is a face recognition system developed in 2015 by researchers at Google that achieved then state-of-the-art results. It maps the face images to euclidean space and learns on the L2 distance between the embeddings. Our paper notes on FaceNet can be found here.
This is a pytorch implementation of FaceNet paper with ResNet as the backbone architechture. At first the implementation was done on AT&T Dataset of Faces, then on LFW Dataset. We used online triplet mining method for selecting triplets.
Wandb was used throughout this part of the project for metric tracking, hyperparameter tuning, sweeps, visualization, etc.
(a) Metrics of ResNet18 on LFW (b) Sweeps of ResNet18 on ATT
The dataset was split in 35 training classes and 5 test classes
Parameter | Value |
---|---|
Architechture | ResNet18 |
Embeddings Dimension | 64 |
No. of Learnable Parameters | 11,209,344 |
Epochs | 200 |
Learning Rate | 0.0002 |
Optimizer | Adam |
Batch Size | 100 |
Margin | 1 |
Results | Train Set | Test Set |
---|---|---|
Accuracy | 1.0 | 0.984 |
Recall | 1.0 | 0.978 |
Precision | 1.0 | 0.936 |
ROC area under curve | 1.0 | 0.981 |
Euclidean Distance Threshold | 0.91 | 0.89 |
(a) Epoch Loss. (b) EER Curve. (c) t-SNE Emdeddings.
(d) ROC Curve on train set. (e) ROC Curve on test set
Deep Funneled set of LFW images was used for training and evaluation purpose.
The faces were extracted by center crop and then resized to match input shape. Further they were normalized overall data's mean and standard deviation.
MEAN = torch.Tensor([0.5929, 0.4496, 0.3654])
STD = torch.Tensor([0.2287, 0.1959, 0.1876])
transform = transforms.Compose([
transforms.CenterCrop((128,98)),
transforms.Resize((224,224)),
transforms.ToTensor(),
transforms.Normalize(mean=MEAN, std=STD),
])
LFWDataset.py contains the custom dataset classes for loading LFW data in all configurations. This dataset class was later contributed to Torchvision library.
Architechture | Embeddings Dimension |
No. of Learnable Parameters |
Epochs | Learning Rate | Batch Size | |
---|---|---|---|---|---|---|
Training | ResNet-18 | 128 | 11,242,176 | 200 | 0.002 (Reduced by factor of 2 every 50 epochs) |
256 |
To train run
train.py --config configs/resnet18lfw.yml --data_dir ../datasets/lfw --wandb true
To resume training
train.py --config configs/resnet18lfw.yml --data_dir ../datasets/lfw --wandb true --resume "checkpoints/model_resnet18_triplet_epoch_120_08-Dec 15:57.pt"
Model State Dict
state = {
'epoch': epoch+1,
'embedding_dimension': p.fc_layer_size,
'batch_size_training': p.batch_size,
'model_state_dict': model.state_dict(),
'model_architecture': p.backbone,
'optimizer_state_dict': optimizer.state_dict(),
'scheduler_state_dict': scheduler.state_dict(),
'best_distance_threshold': best_threshold,
'accuracy':accuracy
}
Accuracy | Precision | Recall | ROC Area Under Curve |
Euclidean Distance |
TAR @ FAR=1e-2 |
---|---|---|---|---|---|
88.35% | 88.46% | 88.23% | 0.9508 | 1.104 | 61.07% |