Evaluating SoA Post-Training Quantization Algorithms
Quantizing neural networks to low-bitwidth integer arithmetic is an essential step in their deployment to embedded and edge computing platforms: It reduces, at once, the memory footprint (activation quantization), the storage required to store the network parameters (weight quantization), the energy required to perform each operation and the total inference energy (provided the platform natively supports the chosen bitwidth). At IIS, we have been performing quantization using a class of procedures known as quantization-aware training, where the network is modified to reflect the changes introduced by quantization and (re-)trained in this altered form. The downside of this approach is the large amount of resources (compute power and time) it takes to retrain a neural network. Recently, post-training quantization algorithms such as AdaRound and BRECQ have been gaining traction and closing the accuracy gap even at lower bitwidths compared to quantization-aware training of low-precision neural networks. This class of algorithms takes a full-precision neural network and converts it directly (without retraining) to a quantized one, applying various techniques to determine the optimal quantized weight values and/or activation quantization parameters. In this thesis, you will be implementing one or multiple of these algorithms and evaluating their performance on a selection of benchmark applications, comparing it to that of the quantization-aware training procedures we have been using so far.
The main goals of the project are:
- Select and implement a state-of-the-art post-training quantization algorithm
- Apply your implementation of the algorithm to a selection of reference tasks and networks
- Compare your results to our currently available quantization-aware training solutions
Looking for a student for a Semester project.
- Machine Learning
- 20% Theory
- 80% Implementation
-  Y. Li, et al., BRECQ: Pushing the Limit of Post-Trianing Quantization by Block Reconstruction (this is the current state of the art in post-training quantization)
-  M. Nagel, et al., Up or Down? Adaptive Rounding for Post-Training Quantization (this is the "predecessor" of BRECQ)
-  Inside Quantization Aware Training (a blog explanation of the basics of quantization-aware training)