Knowledge Distillation for Embedded Machine Learning
The vast majority of high-performance neural networks used on datasets like ImageNet use Millions or Billions of parameters and are trained and executed with several GPUs at once. Such networks can never be deployed to devices like microcontrollers. However, using novel training techniques, we can leverage these well-trained networks to transfer their knowledge to smaller networks that can be deployed to embedded devices.
Knowledge Distillation is a novel training approach for deep neural networks, which uses well-trained large networks or ensembles of specialized models to train smaller, more efficient networks. This technique shows a lot of potential for deploying models to embedded devices when used in conjunction with well-established quantization techniques. The goal of this thesis is to develop a knowledge distillation algorithm and evaluate it for the training of networks for embedded devices, comparing it to traditional training methods.
The main goals (not all have to be met in a single semester project) of the project are:
- Develop framework for distillation-based training in PyTorch
- Combine knowledge distillation with quantization to optimize model size
- Evaluate knowledge distillation as a method for deployment of networks to embedded devices
Looking for a student for a Semester project.
- Machine Learning
- 20% Theory
- 80% Implementation
-  G. Hinton, et. al., Distilling the Knowledge in a Neural Network
-  T. Furlanello, et. al., Born Again Neural Networks