Personal tools

On-Device Training Sparse Sub-Tensor Update Scheme Optimization for CNN-based tasks (SA or MA)

From iis-projects

Jump to: navigation, search


Status: Available


The fast development of the Internet-of Things (IoT) comes with the growing need for smart end-node devices able to execute Deep Learning networks locally. Processing the data on device has many advantages, not only drastically reducing the latency and communication energy cost, but also taking one step towards autonomous IoT end-nodes. Most of the current research efforts are focusing on inference, under the train-then-deploy paradigm. However, this results in a device unable to face real-life phenomena such as data distribution shifts or class increments.

To adapt to these phenomena, several On-Device Training techniques have been proposed in the last few years. However, training on device still requires a considerable amount of memory, a significant challenge in the context of tightly memory constrained devices such as Microcontrollers (MCUs).

In this project, we explore new methods to reduce the memory footprint for on-device training by pruning sub-tensors based on their gradients' contribution to the accuracy, followed by extrapolating the findings to inference.


  • 15% Literature research
  • 50% Sparse Update Implementation
  • 35% Benchmarking


  • Experience with Python and PyTorch.
  • Knowledge of Deep Learning

Project Goals

The main tasks of this project are:

  • T1: Python implementation and evaluation setup

    You will implement the sparse update scheme proposed by Lin et al.[1], evaluating it on MobileNetV2 [2] at a channel-level granularity on an image classification task [3].

  • T2: Sparse Scheme Optimizer and benchmarking

    You will leverage Evolutionary Search to quickly reach good sparse scheme. The type of Evolutionary algorithm and implementation will be carefully studied to fit well this particular optimization problem. Then the optimizer will be benchmarked against a random search baseline.

  • T3: Sparse Scheme Optimizer and benchmarking

    Following the optimization and evaluation on a layer level developed in T1-T2, increase the granularity to sub-tensor level. You will evaluate the implementation over different datasets [3] [4] and/or tasks [5] [6].

  • Optional T1: Extend the method to describe inference sparsity

    Using the results obtained in T2, you will evaluate the ability of describing the inference sparsity using the gradient contribution to the accuracy.

Project Organization

Weekly Meetings

The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.


Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Tgif (See: and or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.

Final Report

A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.


At the end of the project, the outcome of the thesis will be presented in a 15-minutes talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.


[1] Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song. On-Device Training Under 256KB Memory. 2022.

[2] Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh. MobileNetV2: Inverted Residuals and Linear Bottlenecks. 2018.

[3] Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li. Imagenet: A large-scale hierarchical image database. 2009.

[4] Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.

[5] Warden, Pete. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. 2018.

[6] Mazumder, Mark and Chitlangia, Sharad and Banbury, Colby and Kang, Yiping and Ciro, Juan and Achorn, Keith and Galvez, Daniel and Sabini, Mark and Mattson, Peter and Kanter, David and Diamos, Greg and Warden, Pete and Meyer, Josh and Janapa Reddi, Vijay. Multilingual Spoken Words Corpus. 2021.