Personal tools

Difference between revisions of "Probing the limits of fake-quantised neural networks"

From iis-projects

Jump to: navigation, search
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== Introduction ==
 
== Introduction ==
  
Deep neural networks usually require huge computational resources to deliver their statistical power.
+
Deep neural networks usually require substantial computational resources to deliver their statistical power.
However, in many applications where latency and data privacy are important constraints, it might be necessary to execute these models on resource-constrained computing systems such as embedded, mobile, or edge devices.
+
However, in many applications where latency and data privacy are essential constraints, it might be necessary to execute these models on resource-constrained computing systems such as embedded, mobile, or edge devices.
 
The research field of TinyML has approached the problem of making DNNs more efficient from several angles: optimising network topologies in terms of accuracy-per-parameter or accuracy-per-operation; reducing model size via techniques such as parameter pruning; replacing high-bit-width floating-point operands with low-bit-width integer operands, training so-called quantised neural networks (QNNs).
 
The research field of TinyML has approached the problem of making DNNs more efficient from several angles: optimising network topologies in terms of accuracy-per-parameter or accuracy-per-operation; reducing model size via techniques such as parameter pruning; replacing high-bit-width floating-point operands with low-bit-width integer operands, training so-called quantised neural networks (QNNs).
QNNs have the double benefit or reducing model size and replacing the energy-costly floating-point arithmetic with the energy-efficient integer arithmetic, properties that make them an ideal fit for resource-constrained hardware.
+
QNNs have the double benefit of reducing models size and replacing the energy-costly floating-point arithmetic with the energy-efficient integer arithmetic.
 +
These properties make QNNs an ideal fit for resource-constrained hardware.
  
 
At training time, QNNs use floating-point operands to leverage the optimised floating-point software kernels provided by the chosen deep learning frameworks.
 
At training time, QNNs use floating-point operands to leverage the optimised floating-point software kernels provided by the chosen deep learning frameworks.
However, these floating-point parameters are constrained in such a way that the application of elementary arithmetic properties (e.g., associative, distributive) allow to return fully integerised programs that can be deployed to the target hardware.
+
However, these floating-point parameters are constrained so that the application of elementary arithmetic properties (e.g., associative, distributive) allows transforming them into fully integerised programs that can be deployed to the target hardware.
 
Unfortunately, this conversion process is a lossy one, and the techniques that can reduce the errors it introduces are crucial to make QNNs useful in practice.
 
Unfortunately, this conversion process is a lossy one, and the techniques that can reduce the errors it introduces are crucial to make QNNs useful in practice.
  
 
In this project, you will explore the impact of different floating-point formats on what we call the fake-to-true conversion process.
 
In this project, you will explore the impact of different floating-point formats on what we call the fake-to-true conversion process.
 
  
 
== Project description ==
 
== Project description ==
  
: [[File:Probing_the_limits_of_fake-quantised_neural_networks.pdf]]
+
: [https://iis-projects.ee.ethz.ch/images/6/68/Probing_the_limits_of_fake-quantised_neural_networks.pdf Project Description]
  
  
Line 42: Line 42:
 
== Logistics ==
 
== Logistics ==
  
The student and the advisor will meet on a weekly basis to check the progress of the project, clarify doubts, and decide the next steps.
+
The student and the advisors will meet on a weekly basis to check the progress of the project, clarify doubts, and decide the next steps.
 
The schedule of this weekly update meeting will be agreed at the beginning of the project by both parties.
 
The schedule of this weekly update meeting will be agreed at the beginning of the project by both parties.
 
Of course, additional meetings can be organised to address urgent issues.
 
Of course, additional meetings can be organised to address urgent issues.
  
At the end of the project, you will have to present your work during a 20 minutes talk in front of the IIS team and defend it during the following 5 minutes discussion.
+
At the end of the project, you will have to present your work during a 15 minutes (20 minutes if carried out as a Master Thesis) talk in front of the IIS team and defend it during the following 5 minutes discussion.
  
  
Line 111: Line 111:
  
 
STATUS
 
STATUS
[[Category:Hot]]
 
 
[[Category:Available]]
 
[[Category:Available]]
 
[[Category:In progress]]
 
[[Category:In progress]]
 
[[Category:Completed]]
 
[[Category:Completed]]
 +
[[Category:Hot]]
  
 
TYPE OF WORK
 
TYPE OF WORK

Revision as of 13:09, 24 February 2022

Introduction

Deep neural networks usually require substantial computational resources to deliver their statistical power. However, in many applications where latency and data privacy are essential constraints, it might be necessary to execute these models on resource-constrained computing systems such as embedded, mobile, or edge devices. The research field of TinyML has approached the problem of making DNNs more efficient from several angles: optimising network topologies in terms of accuracy-per-parameter or accuracy-per-operation; reducing model size via techniques such as parameter pruning; replacing high-bit-width floating-point operands with low-bit-width integer operands, training so-called quantised neural networks (QNNs). QNNs have the double benefit of reducing models size and replacing the energy-costly floating-point arithmetic with the energy-efficient integer arithmetic. These properties make QNNs an ideal fit for resource-constrained hardware.

At training time, QNNs use floating-point operands to leverage the optimised floating-point software kernels provided by the chosen deep learning frameworks. However, these floating-point parameters are constrained so that the application of elementary arithmetic properties (e.g., associative, distributive) allows transforming them into fully integerised programs that can be deployed to the target hardware. Unfortunately, this conversion process is a lossy one, and the techniques that can reduce the errors it introduces are crucial to make QNNs useful in practice.

In this project, you will explore the impact of different floating-point formats on what we call the fake-to-true conversion process.

Project description

Project Description


Skills and project character

Skills

Required:

  • Fundamental concepts of deep learning (convolutional neural networks, backpropagation, computational graphs)
  • Numerical representation formats (integer, floating-point)
  • Numerical analysis
  • Python programming
  • C/C++ programming

Optional:

  • Knowledge of the PyTorch deep learning framework
  • Knowledge of digital arithmetic (e.g., two's complement, overflow, wraparound)

Project character

  • 20% Theory
  • 40% C/C++ and Python coding
  • 40% Deep learning


Logistics

The student and the advisors will meet on a weekly basis to check the progress of the project, clarify doubts, and decide the next steps. The schedule of this weekly update meeting will be agreed at the beginning of the project by both parties. Of course, additional meetings can be organised to address urgent issues.

At the end of the project, you will have to present your work during a 15 minutes (20 minutes if carried out as a Master Thesis) talk in front of the IIS team and defend it during the following 5 minutes discussion.


Professor

Luca Benini


Status: Available

We are looking for 1 Master student. It is possible to complete the project either as a Semester Project or a Master Thesis.

Supervisors: Matteo Spallanzani spmatteo@iis.ee.ethz.ch, Renzo Andri (Huawei RC Zurich)