Skip to content

Latest commit

 

History

History
118 lines (85 loc) · 3.81 KB

README.md

File metadata and controls

118 lines (85 loc) · 3.81 KB

Network Distillation using a Fisher expanded teacher

This work formed my MSc Dissertation at The University of Edinburgh. A copy of the dissertation can be found here. The work investigates ways to go about developing a teacher network for distillation when given a small, non-standard student network, typically developed through Neural Architecture Search. My method involved using Fisher information to determine which blocks of the student network to scale, developing a teacher network from the student. The student can then be trained by this new teacher via attention transfer, or knowledge distillation.

Files

activations.py -- A helper script for producing visualisations of a models activations
fisher_expand.py --- Code for expanding a given model using our Fisher expansion algorithm
funcs.py --- Functions used throughout the codebase
main.py --- Main script for performing distillation
model.py --- DARTS model code
operations.py --- DARTS operations code
utils.py --- Extra utility functions

Expanding a Model

python fisher_expand.py cifar10 --data_loc <cifar location> --base_model <model file>

Training a Teacher

python main.py cifar10 -t <teacher checkpoint> --teach_arch <darts|densenet|wrn> 

Training a Student

python main.py cifar10 -s <student checkpoint> --student_arch <darts|densenet|wrn> --teacher_arch <darts|densenet|wrn> 

Acknowledgements

The following repos provided basis and inspiration for this work

https://github.com/BayesWatch/xdistill
https://github.com/quark0/darts
https://github.com/BayesWatch/pytorch-blockswap
https://github.com/szagoruyko/attention-transfer
https://github.com/kuangliu/pytorch-cifar
https://github.com/xternalz/WideResNet-pytorch
https://github.com/ShichenLiu/CondenseNet