Personal tools

Learning at the Edge with Hardware-Aware Algorithms

From iis-projects

Jump to: navigation, search


Overview

Status: Available

Introduction

The rapid rise of the Internet-of-Things (IoT) underscores the increasing demand for intelligent end-node devices that can run Deep Learning networks locally. Processing the data on device has many advantages, not only drastically reducing the latency and communication energy cost, but also taking one step towards autonomous IoT end-nodes. Presently, the majority of research works, as well as existing industrial tools, focus on inference, following the train-then-deploy approach. This results in a device unable to face real-life phenomena such as data distribution shifts or changes in the target classes.

To navigate these challenges, few On-Device Training methodologies have emerged recently. Yet, on-device training is memory-intensive, a notable bottleneck for memory-constraint devices like Microcontrollers (MCUs). Techniques like Sparse Update [1] are gaining ground due to their minimal memory requirements and robust performance.

Most techniques are demonstrated on MCUs with straightforward memory hierarchies like ARM Cortex-M4. In contrast, cutting-edge Edge processors such as GAP9 [2] possess a layered memory structure, equipped with a compact L1 and necessitating tiling [3] for crucial DNN tasks like GEMM or Convolutions.

In this project, the objective is to tailor the Sparse Update for Edge Platforms with Software-managed caches, ensuring the algorithm produces optimal tiles whilst boosting the models' accuracy through on-device training.

Character

  • 15%: Literature review to familiarize with SotA on-device learning techniques and the architecture and specificities of the edge SoC GAP9.
  • 55%: Hands-on Python coding where the student will implement and test the Sparse Update algorithm at various granularity (e.g.: per layer, per channel).
  • 30%: Evaluation and benchmarking of kernel performances and retraining policy.

Prerequisites

  • Experience with Python and PyTorch.
  • Knowledge of Deep Learning

Project Goals

The main tasks of this project are:

  • T1: Pytorch implementation of Sparse Update and benchmark on MobileNet-V2.

    In the first task, in the context of image classification or keyword spotting, you will implement the Sparse Update Algorithm from [1] and benchmark its efficiency for different granularity levels (i.e., layer, channel).

  • T2: Characterisation of the PULP-TrainLib kernels on GAP9.

    You will measure the efficiency of the kernels of PULP-TrainLib~[4] on the edge processor GAP9 for different input sizes such that you will be able to estimate the efficiency of a kernel by knowing solely the input size of the problem.

  • T3: Exploration of constraint schemes to apply to the Sparse Update algorithm.

    In this step you will put together the work done in the two previous steps to explore the trade-off between accuracy gain and latency induced by constraining the Sparse Update.

  • Project Organization

    Weekly Meetings

    The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.

    Report

    Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Tgif (See: http://bourbon.usc.edu:8001/tgif/index.html and http://www.dz.ee.ethz.ch/en/information/how-to/drawing-schematics.html) or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.

    Final Report

    A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.

    Presentation

    At the end of the project, the outcome of the thesis will be presented in a 15-minutes talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.

    References

    [1] Lin, Ji and Zhu, Ligeng and Chen, Wei-Ming and Wang, Wei-Chen and Gan, Chuang and Han, Song. On-Device Training Under 256KB Memory. 2022.

    [2] GreenWaves Technologies GAP9 https://greenwaves-technologies.com/gap9_processor/

    [3] Alessio Burrello and Angelo Garofalo and Nazareno Bruschi and Giuseppe Tagliavini and Davide Rossi and Francesco Conti DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs 2021.

    [4] Nadalini, Davide and Rusci, Manuele and Tagliavini, Giuseppe and Ravaglia, Leonardo and Benini, Luca and Conti, Francesco PULP-TrainLib: Enabling On-Device Training For RISC-V Multi-Core MCUs Through Performance-Driven Autotuning 2022.