Hardware Accelerators for Lossless Quantized Deep Neural Networks

Introduction

At the Integrated Systems Laboratory (IIS) we have been working for several years on ultra-low-power smart analytics HW in the context of the PULP (Parallel Ultra-Low Power) project. One of the main technology to perform the extraction of high level semantically rich information out of raw data is deep learning, and in particular deep convolutional neural networks (CNNs). The task of inference from a CNN trained offline. This project aims at the next level of energy efficiency for deep inference in a PULP-based platform by means of innovative HW and SW techniques such as heterogeneous integration of HW accelerators. While several PULP chips have already employed HW acceleration techniques for the purpose of accelerating CNNs (Mia Wallace, Fulmine [Conti2017]) as well as stand-alone ASICs for aggressively quantized CNNs (YodaNN [Andri2017]), in the Ergo project we want to design a PULP-based entire computation cluster around a set of deep, fast and low-power deep learning engines.

Project description

The primary purpose of this project is to contribute to the Ergo deep inference System-on-Chip by designing HW/SW techniques for the acceleration of aggressively quantized non-binary deep neural networks. A SystemVerilog framework for the development of cluster-coupled processing engines is already available and in use for the XNOR Neural Engine (XNE), a coprocessor for binarized neural networks.

In this thesis, the first objective is to create a flexible and efficient processing engine for traditional non-binary (but aggressively quantized) neural networks: a Quantized Neural Engine (QNE). After an initial design space exploration, it will be decided which specific kinds of quantization (e.g. [Intel2017]) should be targeted for the design, and whether the QNE should be an extension to the currently existing XNE or a novel design based on the same building blocks. A full, integrated Ergo cluster will be designed integrating the XNE, QNE and the PULP Riscy cores within the cluster, which will be evaluated by deploying a quantized version of a state-of-the-art CNN such as AlexNet [Krizhevsky2012].

Required Skills

To work on this project, you will need:

to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 / VLSI2 courses is recommended
to have prior knowedge of hardware design and computer architecture - having followed the Advances System-on-Chip Design course is recommended

Other skills that you might find useful include:

familiarity with a scripting language for numerical simulation (Python or Matlab or Lua…)
to be strongly motivated for a difficult but super-cool project

If you want to work on this project, but you think that you do not match some the required skills, we can give you some preliminary exercise to help you fill in the gap.

Status: Completed

Paul Scheffler, Luca Colagrande

Supervision: Francesco Conti, Pasquale Davide Schiavone, Fabian Schuiki

Date: Spring 2019

Professor

Luca Benini

↑ top

Practical Details

Meetings & Presentations

The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.

Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [1].

At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.

Literature

[Andri2017] R. Andri et al., YodaNN: an Architecture for Ultra-Low Power Binary-Weight CNN Acceleration, [2]
[Conti2017] F. Conti et al., An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics, [3]
[Intel2017] A. Zhou et al., Incremental Network Quantization: Towards Lossless CNNs with low-precision weights, [4]
[Krizhevsky2012] A. Khrizevsky et al., ImageNet Classification with Deep Convolutional Neural Networks, [5]

Links

The EDA wiki with lots of information on the ETHZ ASIC design flow (internal only) [6]
The IIS/DZ coding guidelines [7]

↑ top

Personal tools

Hardware Accelerators for Lossless Quantized Deep Neural Networks - iis-projects

Search

Navigation

Tools