Bandwidth Efficient NEureka
- Type: Semester Thesis
- Professor: Prof. Dr. L. Benini
In recent years, Artificial Neural Networks (ANN) have revolutionized many applications, enabled by GPU acceleration of machine learning algorithms. To bring machine learning to ever-smaller devices with highly constrained power- and compute budgets, task-specific acceleration has proven to be indispensable. In a heterogeneous subsystem with shared memory, it is very important to efficiently use the bandwidth offered by the memory subsystem.
NEureka is a hardware processing engine meant to be integrated within the PULP cluster to accelerate the execution of Deep Learning inference kernels such as convolutional, depthwise, and fully connected kernels. It allows bit-Serial computation over the bit weights dimension to seamlessly support from 2 to 8bits of weights, including “unusual” bit widths such as 3, 5, 6, and 7 bits.
In this project, we will explore the bandwidth used by NEureka to execute a variety of Deep Learning Networks and assess the efficiency. Then will perform architectural changes to make it Bandwidth efficient with minimal impact on area and power metrics.
- 15% Literature research
- 20% Workload exploration on Neureka
- 25% RTL coding
- 40% Verification
To work on this project, you will need:
- to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed (or actively following during the project) the VLSI1 / VLSI2 courses is strongly recommended
- to have some prior knowledge of hardware design and architectures
Other skills that you might find useful include:
- Familiarity with Deep Learning Algorithms
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion for the next steps. These meetings are meant to provide a guaranteed time slot for a mutual exchange of information on how to proceed, clear out any questions from either side and ensure the student’s progress.
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif (See: http://bourbon.usc.edu:8001/tgif/index.html and http://www.dz.ee.ethz.ch/en/information/how-to/drawing-schematics.html) or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.
At the end of the project, the outcome of the thesis will be presented in a 15-minute talk and 5 minutes of discussion in front of interested people in the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.
- XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference https://ieeexplore.ieee.org/abstract/document/8412533
- The IIS/DZ coding guidelines