Scalable Heterogeneous L1 Memory Interconnect for Smart Accelerator Coupling in Ultra-Low Power Multicores

Introduction

At the Integrated Systems Laboratory (IIS) we have been working for several years on ultra-low-power smart analytics HW in the context of the PULP (Parallel Ultra-Low Power) project. One of the main technology to perform the extraction of high level semantically rich information out of raw data is deep learning, and in particular deep convolutional neural networks (CNNs). The task of inference from a CNN trained offline. This project aims at the next level of energy efficiency for deep inference in a PULP-based platform by means of innovative HW and SW techniques such as heterogeneous integration of HW accelerators. While several PULP chips have already employed HW acceleration techniques for the purpose of accelerating CNNs (Mia Wallace, Fulmine [Conti2017]) as well as stand-alone ASICs for aggressively quantized CNNs (YodaNN [Andri2017]), in the Ergo project we want to design a PULP-based entire computation cluster around a set of deep, fast and low-power deep learning engines.

Project description

Cluster-coupled Hardware Processing Engines (HWPEs) constitute an innovative class of hardware accelerators with improved cooperation with software, high energy efficiency and much greater flexibility than traditional loosely coupled hardware accelerators. Such accelerators are coupled with memory through the same interconnect using by software cores, thus enjoying their same level of access flexibility. However, emerging applications such as Deep Learning would greatly benefit from super-high bandwidth access, while at the same time they show very regular access patterns that could be exploited in the design of more efficient hardware accelerators. At the same time, the same applications insist with different patterns on distinct data, making the presence of heterogeneous memory (with different power/area characteristics) intrinsically attractive. In this thesis, you will design a novel heterogeneous interconnect for the PULP system to connect high-throughput hardware accelerators to memory in a much more scalable and efficient way than what is currently possible. The new L1 interconnect will be tested together with a state-of-the-art accelerator for Binary Neural Networks, constituting a very important component for forthcoming PULP-based heterogeneous computing chips.

Required Skills

To work on this project, you will need:

to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 / VLSI2 courses is recommended
to have prior knowedge of hardware design and computer architecture - having followed the Advances System-on-Chip Design course is recommended

Other skills that you might find useful include:

familiarity with embedded C
familiarity with a scripting language (e.g. Python)
to be strongly motivated for a difficult but super-cool project

If you want to work on this project, but you think that you do not match some the required skills, we can give you some preliminary exercise to help you fill in the gap.

Status: Available

Supervision: Francesco Conti

Professor

Luca Benini

↑ top

Practical Details

Meetings & Presentations

The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.

Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [1].

At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.

Links

The EDA wiki with lots of information on the ETHZ ASIC design flow (internal only) [2]
The IIS/DZ coding guidelines [3]

↑ top

Personal tools

Scalable Heterogeneous L1 Memory Interconnect for Smart Accelerator Coupling in Ultra-Low Power Multicores - iis-projects

Search

Navigation

Tools