Personal tools

Difference between revisions of "Advanced Data Movers for Modern Neural Networks"

From iis-projects

Jump to: navigation, search
(Created page with "==Introduction== In the current era of AI everything, efficient neural network inference has become essential for virtually all edge computing systems, including our very own...")
 
Line 42: Line 42:
  
 
[[Category:Digital]] [[Category:Available]] [[Category:Semester Thesis]] [[Category:Bachelor Thesis]]
 
[[Category:Digital]] [[Category:Available]] [[Category:Semester Thesis]] [[Category:Bachelor Thesis]]
[[Category:Scheremo]]
+
[[Category:Scheremo]] [[Category:Tbenz]]

Revision as of 15:11, 23 November 2023

Introduction

In the current era of AI everything, efficient neural network inference has become essential for virtually all edge computing systems, including our very own PULPissimo SoC. The key to unlocking machine learning deployment on memory-constrained devices is the division of large and deep networks into manageable pieces that fit the last-level cache constraints of target systems, a process that is typically called tiling. While highly efficient accelerators and kernel libraries exist for most neural network operators, energy-efficient and performant data movement remains a "blind spot" of most embedded systems. Pushing the envelope of efficient data movement, we developed a performant, reconfigurable data movement accelerator (iDMA) for embedded edge computing systems.

Project description

This project will enhance our tried-and-true PULPissimo SoC architecture by integrating the iDMA data movement accelerator. To validate correct integration and as a concrete large-scale use case, non-trivial neural networks requiring on-the-fly data transposition will be mapped to this novel cluster IP using the Deeploy code generation framework.

In this uniquely vertical project, the student will have the opportunity to

  1. Integrate & verify the iDMA engine in the PULPissimo SoC
  2. Implement a software driver to leverage the enhanced capabilities of the iDMA
  3. Model and optimize iDMA code generation in Deeploy for efficient data movement between cache hierarchy levels

Depending on progress, the following points may be tackled

  • Implementation of the SoC in an FPGA
  • End-to-end deployment of tiled neural networks

Required Skills

  • Basic knowledge of and familiarity with the System Verilog language
  • Basic knowledge of and familiarity with the C programming language

Useful Skills

  • Basic knowledge of and familiarity with Computer Architecture concepts, including SoCs and DMAs
  • Basic knowledge of and familiarity with Neural Network architectures like CNNs and Transformers


Professor

Luca Benini
Status: Available

Possible to complete as a Semester or Bachelor's Thesis

Supervision: Moritz Scherer scheremo@iis.ee.ethz.ch, Thomas Benz tbenz@iis.ee.ethz.ch

Meetings & Presentations

The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues. At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.