Personal tools

Probabilistic training algorithms for quantized neural networks

From iis-projects

Revision as of 19:38, 16 November 2020 by Spmatteo (talk | contribs) (Introduction)
Jump to: navigation, search

Introduction

In recent theoretical research work, we have identified principled ways to compute gradients directed towards the parameters of DNNs that use quantized activation functions [Analytical aspects of non-differentiable neural networks]. Nevertheless, understanding how to update the parameters of a QNN is an open problem. Indeed, parameters of classical DNNs are supposed to be taken from Euclidean spaces, where gradients have a natural interpretation as update directions. In contrast, the parameters of QNNs live in discrete spaces, where the meaning of gradient-based rules is less clear.

A recent work [Gated-XNOR networks] has presented an interesting probabilistic update rule to train ternary QNNs: first, the gradients directed towards the quantized parameters are used to define probabilities of state transitions in the discrete parameter space; then, the updates are sampled according to a Metropolis-Hastings-like algorithm. Unfortunately, despite the results on the toy CIFAR-10 data set being promising, this solution does not have an efficient software implementation. Hence its performance on more challenging benchmarks, such as the ILSVRC2012 data set (AKA ImageNet) has not been assessed.

Moreover, it might be possible to generalise this approach by improving the probabilistic model on which the algorithm is grounded.

Project description

In this project, you will first be in charge of implementing a CUDA-accelerated version of the Gated-XNOR algorithm and test it on the ILSVRC2012 data set. Then, we will design and investigate the performance of more general probabilistic training algorithms for QNNs.

If time will remain, you will also export your trained networks to formats which can be executed on existing QNN accelerators designed by members of our team.


References

Competences

Required:

  • Algorithms & data structures
  • Python programming
  • Basic knowledge in deep learning (convolutional neural networks, backpropagation)

Useful:

  • Knowledge of the PyTorch deep learning framework
  • C/C++ programming
  • Knowledge in stochastic processes (e.g., Markov chains)


Professor

Luca Benini


Status: Available

Possible to complete as a Master Thesis.

Supervisor: Matteo Spallanzani spmatteo@iis.ee.ethz.ch


Logistics

The student and the advisor will meet on a weekly basis to check the progress of the project, clarify doubts and decide the next steps. The schedule of this weekly update meeting will be agreed at the beginning of the project by both parties. Of course, additional meetings can be organised to address urgent issues.

At the end of the project, you will have to present your work during a 15 minutes talk in front of the IIS team, and defend it during the following 5 minutes discussion.