Personal tools

Transformer Deployment on Heterogeneous Many-Core Systems

From iis-projects

Jump to: navigation, search


Overview

Status: Completed

Introduction

The demand for high performance under a limited power budget has led to the development of heterogeneous platforms like Apple M2 and Qualcomm Snapdragon, combining multi-core processors and accelerators. While general-purpose processors execute sequential code, highly parallel specific tasks such as encryption, signal processing, and machine learning are offloaded to accelerators, resulting in improved performance and energy efficiency.

Transformer is a relatively new deep learning model proposed in 2017 and provides higher accuracy compared to recurrent neural networks (RNNs) and convolutional neural networks (CNNs) in natural language processing (NLP) and computer vision tasks. The self-attention mechanism is a key component of the model but results in a high computational load and a complicated operational scheme. For this reason, we developed an Integer Transformer Accelerator (ITA) that can efficiently perform self-attention and integrated it into a many-core system called MemPool.

MemPool is an image signal processor (ISP) designed at ETH. Originally, it boasts 256 lightweight 32-bit Snitch cores developed at ETH. They implement the RISC- V instruction set architecture (ISA), which is a modular and open ISA. MemPool with ITA includes 192 cores and 4 ITA cores that accelerate the execution of attention for transformer workloads.

The goal of this thesis is to deploy transformer models to MemPool with ITA using DumpO.footprint for on-device training by pruning sub-tensors based on their gradients' contribution to the accuracy, followed by extrapolating the findings to inference.


Character

  • 15% Literature research
  • 50% Sparse Update Implementation
  • 35% Benchmarking

Prerequisites

  • Experience with Python and PyTorch.
  • Knowledge of Deep Learning

Project Goals

The main tasks of this project are:

  • T1: Deployment of a Dense layer to MemPool

    In the first task, we aim to generate a basic deployment pipeline and test it with a dense layer. The student will add MemPool as a new target platform for DumpO, obtain the code for the execution of the dense layer, and simulate it in MemPool.

  • T2: Quantization of Transformer model

    In the second task, we select and quantize an I-BERT Encoder for MemPool with ITA. The student will use QuantLab for quantization and prepare the model for deployment.

  • T3: Deployment of I-BERT Encoder on ITA with MemPool

    The student will then extend the deployment pipeline made in T1 to generate the accelerator code to control ITA. Additionally, he/she will have to parallelize and tile the execution according to ITA's constraints.

  • T4: Deployment of I-BERT Encoder on MemPool only

    The student will extend the platform to run the I-BERT encoder using only Snitch cores. We will provide the optimized kernels, the student will have to decide on a parallelization and tilling strategy for the execution of the MHSA.

  • T5: Measuring and benchmarking

    The student will evaluate the performance of the model on MemPool and ITA.

  • T6 (Additional): Deployment of End-to-End Transformers to MemPool with ITA

    The student will extend again the pipeline to support full deployment of an End-to-End Transformer and will evaluate the performances.

Project Organization

Weekly Meetings

The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.

Report

Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Tgif (See: http://bourbon.usc.edu:8001/tgif/index.html and http://www.dz.ee.ethz.ch/en/information/how-to/drawing-schematics.html) or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.

Final Report

A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.

Presentation

At the end of the project, the outcome of the thesis will be presented in a 15-minutes talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.

References