Probing the limits of fake-quantised neural networks

Introduction

Deep neural networks usually require huge computational resources to deliver their statistical power. However, in many applications where latency and data privacy are important constraints, it might be necessary to execute these models on resource-constrained computing systems such as embedded, mobile, or edge devices. The research field of TinyML has approached the problem of making DNNs more efficient from several angles: optimising network topologies in terms of accuracy-per-parameter or accuracy-per-operation; reducing model size via techniques such as parameter pruning; replacing high-bit-width floating-point operands with low-bit-width integer operands, training so-called quantised neural networks (QNNs). QNNs have the double benefit or reducing model size and replacing the energy-costly floating-point arithmetic with the energy-efficient integer arithmetic, properties that make them an ideal fit for resource-constrained hardware.

At training time, QNNs use floating-point operands to leverage the optimised floating-point software kernels provided by the chosen deep learning frameworks. However, these floating-point parameters are constrained in such a way that the application of elementary arithmetic properties (e.g., associative, distributive) allow to return fully integerised programs that can be deployed to the target hardware. Unfortunately, this conversion process is a lossy one, and the techniques that can reduce the errors it introduces are crucial to make QNNs useful in practice.

In this project, you will explore the impact of different floating-point formats on what we call the fake-to-true conversion process.

Project description

Project Description

Skills and project character

Skills

Required:

Fundamental concepts of deep learning (convolutional neural networks, backpropagation, computational graphs)
Numerical representation formats (integer, floating-point)
Numerical analysis
Python programming
C/C++ programming

Optional:

Knowledge of the PyTorch deep learning framework
Knowledge of digital arithmetic (e.g., two's complement, overflow, wraparound)

Project character

20% Theory
40% C/C++ and Python coding
40% Deep learning

Logistics

The student and the advisor will meet on a weekly basis to check the progress of the project, clarify doubts, and decide the next steps. The schedule of this weekly update meeting will be agreed at the beginning of the project by both parties. Of course, additional meetings can be organised to address urgent issues.

At the end of the project, you will have to present your work during a 15 minutes (20 minutes if carried out as a Master Thesis) talk in front of the IIS team and defend it during the following 5 minutes discussion.

Professor

Luca Benini

Status: Available

We are looking for 1 Master student. It is possible to complete the project either as a Semester Project or a Master Thesis.

Supervisors: Matteo Spallanzani spmatteo@iis.ee.ethz.ch, Renzo Andri (Huawei RC Zurich)

Personal tools

Probing the limits of fake-quantised neural networks - iis-projects

Search

Navigation

Tools