Personal tools

Difference between revisions of "Probabilistic training algorithms for quantized neural networks"

From iis-projects

Jump to: navigation, search
(Introduction)
(Logistics)
Line 48: Line 48:
 
Of course, additional meetings can be organised to address urgent issues.
 
Of course, additional meetings can be organised to address urgent issues.
  
At the end of the project, you will have to present your work during a 15 minutes talk in front of the IIS team, and defend it during the following 5 minutes discussion.
+
At the end of the project, you will have to present your work during a 15 minutes talk in front of the IIS team and defend it during the following 5 minutes discussion.
  
  

Revision as of 18:51, 16 November 2020

Introduction

In recent theoretical research work, we have identified principled ways to compute gradients directed towards the parameters of DNNs that use quantized activation functions [Analytical aspects of non-differentiable neural networks]. Nevertheless, understanding how to update the parameters of a QNN is an open problem. Indeed, parameters of classical DNNs are supposed to be taken from Euclidean spaces, where gradients have a natural interpretation as update directions. In contrast, the parameters of QNNs live in discrete spaces, where the meaning of gradient-based rules is less clear.

A recent work [Gated-XNOR networks] has presented an interesting probabilistic update rule to train ternary QNNs: first, the gradients directed towards the quantized parameters are used to define probabilities of state transitions in the discrete parameter space; then, the updates are sampled according to a Metropolis-like algorithm. Unfortunately, despite the results on the toy CIFAR-10 data set being promising, this solution does not have an efficient software implementation. Hence its performance on more challenging benchmarks, such as the ILSVRC2012 data set (AKA ImageNet) has not been assessed.

Moreover, it might be possible to generalise this approach and to improve its performance by modifying the probabilistic model on which the algorithm is grounded.

Project description

In this project, you will first be in charge of implementing a CUDA-accelerated version of the Gated-XNOR algorithm and test it on the ILSVRC2012 data set. Then, we will try to improve its performance to achieve better final model accuracy. To this end, we will design and evaluate more general probabilistic training algorithms for QNNs; for example, we could consider replacing the baseline update algorithm with a more general Metropolis scheme and applying simulated annealing principles on top of the selected approach.

If time remains, we could also consider deploying some of the models trained with the improved algorithms on ternary network accelerators developed by other members of the IIS team.

References

Competences

Required:

  • Algorithms & data structures
  • Python programming
  • C/C++ & CUDA programming
  • Basic knowledge in deep learning (convolutional neural networks, backpropagation)

Optional:

  • Knowledge of the PyTorch deep learning framework
  • Basic knowledge in stochastic processes and stochastic optimisation (e.g., Markov chains, Metropolis algorithm, simulated annealing)

Professor

Luca Benini


Status: Available

Possible to complete as a Master Thesis.

Supervisor: Matteo Spallanzani spmatteo@iis.ee.ethz.ch


Logistics

The student and the advisor will meet on a weekly basis to check the progress of the project, clarify doubts and decide the next steps. The schedule of this weekly update meeting will be agreed at the beginning of the project by both parties. Of course, additional meetings can be organised to address urgent issues.

At the end of the project, you will have to present your work during a 15 minutes talk in front of the IIS team and defend it during the following 5 minutes discussion.