Bridging QuantLab with LPDNN
Deep neural networks usually require substantial computational resources to deliver their statistical power. However, in many applications where latency and data privacy are essential constraints, it might be necessary to execute these models on resource-constrained computing systems such as embedded, mobile, or edge devices. The research field of TinyML has approached the problem of making DNNs more efficient from several angles: optimising network topologies in terms of accuracy-per-parameter or accuracy-per-operation; reducing model size via techniques such as parameter pruning; replacing high-bit-width floating-point operands with low-bit-width integer operands, training so-called quantised neural networks (QNNs). QNNs have the double benefit of reducing models size and replacing the energy-costly floating-point arithmetic with the energy-efficient integer arithmetic. These properties make QNNs an ideal fit for resource-constrained hardware.
Deploying QNNs to embedded platforms and edge devices is a complex task. Researchers have proposed numerous techniques to quantise DNNs, and different target platforms require different toolchains to optimise networks, generate code and execute the corresponding programs. The TinyML research community has developed several tools to train and deploy QNNs. Some of these tools focus on supporting many algorithms but few back-end platforms; others take the complementary approach of supporting a few selected algorithms but a diversified range of back-ends. The Low-Power Deep Neural Network (LPDNN) framework exploits the ONNX standard for DNN representation and an internal intermediate representation to generate and optimise code for heterogeneous embedded platforms featuring CPUs, GPUs, FPGAs, DSPs, and even custom ASICs. In its current state, LPDNN focusses on deployment-oriented optimisation; with specific regard to network quantisation, it supports a single post-training quantisation (PTQ) algorithm. LPDNN would benefit from a front-end enabling algorithmic explorations and network tuning to extend its flexibility from the low-level to the high-level. QuantLab is a PyTorch-based software tool aiming to enable the exploration and comparison of different quantisation-aware training (QAT) algorithms, allowing application developers to choose the best quantisation algorithm for their network. QuantLab emits abstract ONNX-based intermediate representations, remaining agnostic of the target back-end platforms. Therefore, QuantLab is a reasonable front-end candidate for LPDNN.
In this project, you will develop a prototype flow connecting QuantLab with LPDNN. The prototype will be built around a facial landmark recognition application and target a Raspberry Pi 4.
Skills and project character
- Fundamental concepts of deep learning (convolutional neural networks, backpropagation, computational graphs)
- Numerical representation formats (integer, floating-point)
- C/C++ programming
- Python programming
- Familiarity with the PyTorch deep learning framework
- Familiarity with computer vision tasks
- Familiarity with the ARM Compute Library (ARM-CL)
- 40% Deep learning
- 60% C/C++ and Python coding
The student and the advisors will meet on a weekly basis to check the progress of the project, clarify doubts, and decide the next steps. The student and the advisors will also have bi-weekly code reviews to ensure the software contributions are properly aligned with both QuantLab and LPDNN, streamlining code integration. The schedule of these meetings will be agreed at the beginning of the project by both parties. Of course, additional meetings can be organised to address urgent issues.
At the end of the project, you will have to present your work during a 15 minutes (20 minutes if carried out as a Master Thesis) talk in front of the IIS team and defend it during the following 5 minutes discussion.
We are looking for 1 Master student. It is possible to complete the project either as a Semester Project or a Master Thesis.