Evaluating SoA Post-Training Quantization Algorithms

Description

Quantizing neural networks to low-bitwidth integer arithmetic is an essential step in their deployment to embedded and edge computing platforms: It reduces, at once, the memory footprint (activation quantization), the storage required to store the network parameters (weight quantization), the energy required to perform each operation and the total inference energy (provided the platform natively supports the chosen bitwidth). At IIS, we have been performing quantization using a class of procedures known as quantization-aware training, where the network is modified to reflect the changes introduced by quantization and (re-)trained in this altered form. The downside of this approach is the large amount of resources (compute power and time) it takes to retrain a neural network. Recently, post-training quantization algorithms such as AdaRound and BRECQ have been gaining traction and closing the accuracy gap even at lower bitwidths compared to quantization-aware training of low-precision neural networks. This class of algorithms takes a full-precision neural network and converts it directly (without retraining) to a quantized one, applying various techniques to determine the optimal quantized weight values and/or activation quantization parameters. In this thesis, you will be implementing one or multiple of these algorithms and evaluating their performance on a selection of benchmark applications, comparing it to that of the quantization-aware training procedures we have been using so far.

The main goals of the project are:

Select and implement a state-of-the-art post-training quantization algorithm

Apply your implementation of the algorithm to a selection of reference tasks and networks

Compare your results to our currently available quantization-aware training solutions

Status: Completed

Looking for a student for a Semester project.

Supervision: Georg Rutishauser, Cristian Cioflan, Moritz Scherer

Prerequisites

Machine Learning
Python

Character

20% Theory

80% Implementation

Literature

[1] Y. Li, et al., BRECQ: Pushing the Limit of Post-Trianing Quantization by Block Reconstruction (this is the current state of the art in post-training quantization)
[2] M. Nagel, et al., Up or Down? Adaptive Rounding for Post-Training Quantization (this is the "predecessor" of BRECQ)
[3] Inside Quantization Aware Training (a blog explanation of the basics of quantization-aware training)

Professor

Luca Benini

↑ top

Practical Details

↑ top

Personal tools

Evaluating SoA Post-Training Quantization Algorithms - iis-projects

Search

Navigation

Tools