# Probabilistic training algorithms for quantized neural networks

### From iis-projects

## Contents

## Introduction

In recent theoretical research work, we have identified principled ways to compute gradients directed towards the parameters of DNNs that use quantized activation functions [1]. Nevertheless, understanding how to update the parameters of a QNN is an open problem. Indeed, parameters of classical DNNs are supposed to be taken from Euclidean spaces, where gradients have a natural interpretation as update directions. In contrast, the parameters of QNNs live in discrete spaces, where the meaning of gradient-based rules is less clear.

A recent work [2] has presented an interesting probabilistic update rule to train ternary QNNs: first, the gradients directed towards the quantized parameters are used to define probabilities of state transitions in the discrete parameter space; then, the updates are sampled according to a Metropolis-like algorithm [3]. Unfortunately, despite the results on the toy CIFAR-10 data set being promising, this solution does not have an efficient software implementation. Hence its performance on more challenging benchmarks, such as the ILSVRC2012 data set (AKA ImageNet) has not been assessed.

Moreover, it might be possible to generalise this approach and to improve its performance by modifying the probabilistic model on which the algorithm is grounded.

## Project description

In this project, you will first be in charge of implementing a CUDA-accelerated version of the Gated-XNOR algorithm and test it on the ILSVRC2012 data set. Then, we will try to improve its performance to achieve better final model accuracy. To this end, we will design and evaluate more general probabilistic training algorithms for QNNs; for example, we could consider replacing the baseline update algorithm with a more general Metropolis scheme and applying simulated annealing principles [4] on top of the selected approach.

If time remains, we could also consider deploying some of the models trained with the improved algorithms on ternary network accelerators developed by other members of the IIS team.

## References

[1] G. P. Leonardi and M. Spallanzani, “Analytical aspects of non-differentiable neural networks,” arXiv:2011.01858, 2020.

[2] L. Deng, P. Jiao, J. Pei, Z. Wu, and G. Li, “GXNOR-Net: training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework,” Neural Networks, vol. 100, pp. 49–58, 2018.

[3] D. B. Hitchcock, “A history of the Metropolis-Hastings algorithm,” The American Statistician, vol. 57, pp. 254–257, 2003.

[4] S. Kirkpatrick, C. G. J. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, pp. 671–680, 1983.

## Competences

Required:

- Algorithms & data structures
- Python programming
- C/C++ programming
- Fundamentals of CUDA programming
- Fundamental concepts of deep learning (convolutional neural networks, backpropagation, computational graphs)

Optional:

- Knowledge of the PyTorch deep learning framework
- Basic knowledge in stochastic processes and stochastic optimisation (e.g., Markov chains, Metropolis algorithm, simulated annealing)

## Professor

##### Status: Available

It is possible to complete the project as a Master Thesis.

Supervisor: Matteo Spallanzani spmatteo@iis.ee.ethz.ch

## Logistics

The student and the advisor will meet on a weekly basis to check the progress of the project, clarify doubts and decide the next steps. The schedule of this weekly update meeting will be agreed at the beginning of the project by both parties. Of course, additional meetings can be organised to address urgent issues.

At the end of the project, you will have to present your work during a 20 minutes talk in front of the IIS team and defend it during the following 5 minutes discussion.