Training and Deploying Next-Generation Quantized Neural Networks on Microcontrollers
The design and deployment of highly efficient neural networks (NNs) to be executed on microcontroller-class systems (MCUs) has seen intense attention from the research community in recent times, with the current state of the art being represented by MCUNet (see references). Due to the architectural limitations of commodity MCUs, exploiting sub-byte formats for a better model size-accuracy tradeoff has seen only limited attention - while many approaches to network design and quantization have been proposed, only few publications present a flow to map these networks to real systems.
The PULP family of MCUs developed at IIS has hardware support for ultra-low-precision (down to 2 bits) SIMD arithmetic, which was introduced with the express goal of supporting such networks. Furthermore, we have been developing QuantLab, a framework for training quantized NNs and have recently created a prototype flow for automatically integerizing arbitrary precision networks.
The goal of this project is to leverage these existing tools to train and deploy mixed-precision networks on PULP-based systems. More precisely, in this project, you will:
- Select one or more suitable state-of-the-art networks to target a given PULP platform specification - e.g. MCUNet
- (Re)train this network with different per-layer precisions, using approaches from literature or developed yourself to determine the precision for each layer
- Integerize the mixed-precision network for execution on PULP using our newly developed pipeline
- Compare the accuracy-latency-model size tradeoff to the baseline 8-bit model.
A detailed task description and project plan will be uploaded soon, if you are interested in this project and/or have any questions, please do not hesitate to contact us!
Looking for 1-2 students for a Semester project, or potentially a single student for a Master's thesis.
- Machine Learning
- 25% Theory
- 75% Implementation
-  J. Lin et al., MCUNet: Tiny Deep Learning on IoT Devices
-  M. Rusci et al., Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers