Personal tools

Bridging QuantLab with LPDNN

From iis-projects

Revision as of 12:23, 24 February 2022 by Spmatteo (talk | contribs) (Created page with "== Introduction == Deep neural networks usually require huge computational resources to deliver their statistical power. However, in many applications where latency and data...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

Deep neural networks usually require huge computational resources to deliver their statistical power. However, in many applications where latency and data privacy are important constraints, it might be necessary to execute these models on resource-constrained computing systems such as embedded, mobile, or edge devices. The research field of TinyML has approached the problem of making DNNs more efficient from several angles: optimising network topologies in terms of accuracy-per-parameter or accuracy-per-operation; reducing model size via techniques such as parameter pruning; replacing high-bit-width floating-point operands with low-bit-width integer operands, training so-called quantised neural networks (QNNs). QNNs have the double benefit or reducing model size and replacing the energy-costly floating-point arithmetic with the energy-efficient integer arithmetic, properties that make them an ideal fit for resource-constrained hardware.

Deploying QNNs to embedded platforms and edge devices is a complex task. Researchers have proposed numerous techniques to quantise DNNs, and different target platforms require different toolchains to optimise networks, generate code, and execute the corresponding programs. Several tools to train and deploy QNNs have been developed by the TinyML research community, some focussing on supporting many algorithms but few backend platforms, others tacking the complementary approach of supporting few selected algorithms but a diversified range of backends. The Low-Power Deep Neural Network (LPDNN) framework exploits the ONNX standard for DNN representation as well as an internal intermediate representation to generate and optimise code for heterogeneous embedded platforms featuring CPUs, GPUs, FPGAs, DSPs, and even custom ASICs. In its current state, LPDNN focusses on deployment-oriented optimisations. With specific regard to network quantisation, it only supports a post-training quantisation (PTQ) algorithm; hence, LPDNN would benefit from a more flexible front-end to enable algorithmic explorations and network tuning. QuantLab is a PyTorch-based software tool supporting several quantisation-aware training (QAT) algorithms to enable the exploration and comparison of different algorithms, aiming at enabling application developers to choose the best quantisation algorithm for their network. QuantLab emits abstract ONNX-based representations, and aims at remaining agnostic of the target backend platforms.

In this project, you will develop a prototype flow connecting QuantLab with LPDNN. As an example application, the prototype will be built around a facial landmark recognition network.


Project description

[.pdf Project Description]


Skills and project character

Skills

Required:

  • Fundamental concepts of deep learning (convolutional neural networks, backpropagation, computational graphs)
  • Numerical representation formats (integer, floating-point)
  • C/C++ programming
  • Python programming
  • Familiarity with the PyTorch deep learning framework

Optional:

  • Familiarity with computer vision tasks
  • Familiarity with the ARM Compute Library (ARM-CL)

Project character

  • 40% Deep Learning
  • 60% C/C++ and Python coding


Logistics

The student and the advisor will meet on a weekly basis to check the progress of the project, clarify doubts, and decide the next steps. The student and the advisor will also have bi-weekly code reviews to ensure the software contributions are properly aligned with both QuantLab and LPDNN, streamlining code integration. The schedule of these meetings will be agreed at the beginning of the project by both parties. Of course, additional meetings can be organised to address urgent issues.

At the end of the project, you will have to present your work during a 15 minutes (20 minutes if carried out as a Master Thesis) talk in front of the IIS team and defend it during the following 5 minutes discussion.


Professor

Luca Benini


Status: Available

We are looking for 1 Master student. It is possible to complete the project either as a Semester Project or a Master Thesis.

Supervisors: Matteo Spallanzani spmatteo@iis.ee.ethz.ch, Cristian Cioflan [1] Miguel de Prado (Bonseyes Community Association)