Network Distillation using a Fisher expanded teacher

This work formed my MSc Dissertation at The University of Edinburgh. A copy of the dissertation can be found here. The work investigates ways to go about developing a teacher network for distillation when given a small, non-standard student network, typically developed through Neural Architecture Search. My method involved using Fisher information to determine which blocks of the student network to scale, developing a teacher network from the student. The student can then be trained by this new teacher via attention transfer, or knowledge distillation.

Files

activations.py -- A helper script for producing visualisations of a models activations
fisher_expand.py --- Code for expanding a given model using our Fisher expansion algorithm
funcs.py --- Functions used throughout the codebase
main.py --- Main script for performing distillation
model.py --- DARTS model code
operations.py --- DARTS operations code
utils.py --- Extra utility functions

Expanding a Model

python fisher_expand.py cifar10 --data_loc <cifar location> --base_model <model file>

Training a Teacher

python main.py cifar10 -t <teacher checkpoint> --teach_arch <darts|densenet|wrn>

Training a Student

python main.py cifar10 -s <student checkpoint> --student_arch <darts|densenet|wrn> --teacher_arch <darts|densenet|wrn>

Acknowledgements

The following repos provided basis and inspiration for this work

https://github.com/BayesWatch/xdistill
https://github.com/quark0/darts
https://github.com/BayesWatch/pytorch-blockswap
https://github.com/szagoruyko/attention-transfer
https://github.com/kuangliu/pytorch-cifar
https://github.com/xternalz/WideResNet-pytorch
https://github.com/ShichenLiu/CondenseNet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Network Distillation using a Fisher expanded teacher

Files

Expanding a Model

Training a Teacher

Training a Student

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Network Distillation using a Fisher expanded teacher

Files

Expanding a Model

Training a Teacher

Training a Student

Acknowledgements