Mapping Networks on Reconfigurable Binary Engine Accelerator
We have recently designed an accelerator called Reconfigurable Binary Engine (RBE). The RBE architecture exploits two computational concepts, explained below as Binary Based Quantization (BBQ). BBQ allows the RBE to perform convolutions with configurable arithmetic precisions in a flexible and power-scalable way. In this project, you will make us of our in-house developed frameworks NEMO  (or Quantlab ) and DORY[4,5] to map networks onto the RBE accelerator and evaluate their performance and energy efficiency for real networks.
Computational Concept: BBQ - Binary Based Quantization (BBQ)
RBE aims to have a freely configurable accuracy allowing to balance the power and performance vs accuracy tradeoff. The design is inspired by the ABC-Net  which is based on the following two innovations:
- Linear combination of multiple binary weight bases.
- Employing multiple binary activations to alleviate the information loss.
The RBE architecture uses the two innovations to emulate quantized NNs by choosing the binary weights to correspond to each bit of the quantized weights. One quantized NN can therefore be emulated by a superposition of power-of-2 weighted QA×QW binary NN, whereas QW corresponds to the quantization level of the weights and QA quantization level of the activations. We call this concept from now on Binary Based Quantization (BBQ) which allows the RBE to perform convolutions with configurable arithmetic precisions in a flexible and power-scalable way. BBQ can be applied on both complete NNs and single layers.
The RBE accelerator consists out of three parts:
- Control Unit - contains all control related logic:
- Whole tensor tile is handled in a single job, helped by HWPE uloop (tiny microcoded loop processor)
- Classic HWPE programming interface + hardwired controller
- Streamer Unit - handles all request
- Source - includes the address and request generation for reading data from the TCDM memory
- Sink - includes the address and request generation for writing data back to the TCDM memory
- Engine Unit - performs all computation. The unit includes the following modules:
- A grid of 9x9=81 Block units (9 columns of each 9 Block units)
- Each Block includes 4 Binary Convolution Engines, or short Binconv, modules
- Each Binconv performs a QW x 1bit 32x32 Matrix-Vector product in QW x 32 cycles (32 bMAC/cycle)
- The reduced Binconv results are scaled by a power-of-two and accumulated
- The accumulated results of all block in one of the 9 Columns are again accumulated and stored in the Accumulator Banks
- After the full accumulation, the values are quantized by the Quantization module and streamed out
-  RBE Github -> have a look at the documentation
-  Dory Github
-  Dory Examples Github
-  Nemo Github
-  Quantlab Github
-  X. Lin, C. Zhao and W. Pan. "Towards Accurate Binary Convolutional Neural Network." Advances in Neural Information Processing Systems, 2017.
- VLSI I
- C coding
- python coding (optimal: Pytorch)
- 20% Theory
- 20% HW understanding
- 40% ML Tools: Nemo, Dory, Pytorch
- 20% Embedded C programming
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.
Report / Presentation
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawoio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.