Personal tools

Difference between revisions of "Mapping Networks on Reconfigurable Binary Engine Accelerator"

From iis-projects

Jump to: navigation, search
(Status: Available)
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
<!--  
+
[[File:rbe-arch.png|thumb|400px|'''RBE''' architecture]]
[[File:Variation Tolerant.jpg|thumb]]
+
[[File:rbe-bbq.png|thumb|400px|'''BBQ''' computational concept]]
--->
+
 
 
==Short Description==
 
==Short Description==
We have recently designed an accelerator called Reconfigurable Binary Engine (RBE). The RBE architecture uses these two innovations to emulate quantized NNs by choosing the binary weights to correspond to each bit of the quantized weights. One quantized NN can therefore be emulated by a superposition of power-of-2 weighted Q) × Q+ binary NN, whereas Q+ corresponds to the quantization level of the weights and Q) quantization level of the activations. We call this concept from now on Binary Based Quantization (BBQ) which allows the RBE to perform convolutions with configurable arithmetic precisions in a flexible and power-scalable way. In this project we make use of our in-house developed frameworks NEMO and DORY to map networks onto the RBE accelerator and evaluate its performance and energy-efficiency for real networks.
+
We have recently designed an accelerator called Reconfigurable Binary Engine (RBE)[1]. The RBE architecture exploits two computational concepts, explained below as Binary Based Quantization (BBQ). BBQ allows the RBE to perform convolutions with configurable arithmetic precisions in a flexible and power-scalable way. In this project, you will make us of our in-house developed frameworks NEMO [2] (or Quantlab [3]) and DORY[4,5] to map networks onto the RBE accelerator and evaluate their performance and energy efficiency for real networks.
 +
 
 +
== Computational Concept: BBQ - Binary Based Quantization (BBQ) ==
 +
 
 +
RBE aims to have a freely configurable accuracy allowing to balance the power and performance vs accuracy tradeoff. The design is inspired by the ABC-Net [6] which is based on the following two innovations:
 +
 
 +
# Linear combination of multiple binary weight bases.
 +
# Employing multiple binary activations to alleviate the information loss.
  
 +
The RBE architecture uses the two innovations to emulate quantized NNs by choosing the binary weights to correspond to each bit of the quantized weights. One quantized NN can therefore be emulated by a superposition of power-of-2 weighted QA×QW binary NN, whereas QW corresponds to the quantization level of the weights and QA quantization level of the activations. We call this concept from now on Binary Based Quantization (BBQ) which allows the RBE to perform convolutions with configurable arithmetic precisions in a flexible and power-scalable way. BBQ can be applied on both complete NNs and single layers.
 +
 +
== Architecture ==
 +
The RBE accelerator consists out of three parts:
 +
 +
* Control Unit - contains all control related logic:
 +
** Whole tensor tile is handled in a single job, helped by HWPE uloop (tiny microcoded loop processor)
 +
** Classic HWPE programming interface + hardwired controller
 +
* Streamer Unit - handles all request
 +
** Source - includes the address and request generation for reading data from the TCDM memory
 +
** Sink - includes the address and request generation for writing data back to the TCDM memory
 +
* Engine Unit - performs all computation. The unit includes the following modules:
 +
** A grid of 9x9=81 Block units (9 columns of each 9 Block units)
 +
** Each Block includes 4 Binary Convolution Engines, or short Binconv, modules
 +
** Each Binconv performs a QW x 1bit 32x32 Matrix-Vector product in QW x 32 cycles (32 bMAC/cycle)
 +
** The reduced Binconv results are scaled by a power-of-two and accumulated
 +
** The accumulated results of all block in one of the 9 Columns are again accumulated and stored in the Accumulator Banks
 +
** After the full accumulation, the values are quantized by the Quantization module and streamed out
 +
 +
 +
===Literature===
 +
* [https://github.com/pulp-platform/rbe] RBE Github -> have a look at the documentation
 +
* [https://github.com/pulp-platform/dory] Dory Github
 +
* [https://github.com/pulp-platform/dory_examples] Dory Examples Github
 +
* [https://github.com/pulp-platform/nemo] Nemo Github
 +
* [https://github.com/pulp-platform/quantlab] Quantlab Github
 +
* [6] X. Lin, C. Zhao and W. Pan. "Towards Accurate Binary Convolutional Neural Network." Advances in Neural Information Processing Systems, 2017.
  
 
===Status: Available ===
 
===Status: Available ===
: Looking for 1-2 Semester/Master students
+
* Looking for 1-2 Semester/Master students
: Contact: [[:User:Paulin| Gianna Paulin]], [[:User:Thoriri| Thorir Mar Ingolfsson]]
+
* Contact: [[:User:Paulin| Gianna Paulin]], [[:User:Thoriri| Thorir Mar Ingolfsson]]
  
 
===Prerequisites===
 
===Prerequisites===
Line 26: Line 60:
 
--->
 
--->
 
===Character===
 
===Character===
: 20% Theory
+
* 20% Theory
: 20% HW understanding
+
* 20% HW understanding
: 40% ML Tools: Nemo, Dory, Pytorch
+
* 40% ML Tools: Nemo, Dory, Pytorch
: 20% Embedded C programming
+
* 20% Embedded C programming
  
 
===Professor===
 
===Professor===
[http://www.iis.ee.ethz.ch/people/person-detail.html?persid=194234 Luca Benini]
+
* [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=194234 Luca Benini]
<!-- : [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=78758 Qiuting Huang] --->
 
<!--: [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=80923 Mathieu Luisier] --->
 
<!--: [https://ee.ethz.ch/the-department/people-a-z/person-detail.MjUwODc0.TGlzdC8zMjc5LC0xNjUwNTg5ODIw.html Taekwang Jang] --->
 
<!--: [https://ee.ethz.ch/the-department/faculty/professors/person-detail.OTY5ODg=.TGlzdC80MTEsMTA1ODA0MjU5.html Christoph Studer] --->
 
<!-- : [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=79172 Andreas Schenk] --->
 
 
 
[[#top|↑ top]]
 
  
 
== Project Organization ==
 
== Project Organization ==
Line 49: Line 76:
 
==== Report / Presentation ====
 
==== Report / Presentation ====
  
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawoio (See: http://bourbon.usc.edu:8001/tgif/index.html and http://www.dz.ee.ethz.ch/en/information/how-to/drawing-schematics.html) or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.
+
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.
  
 
====== Final Report ======
 
====== Final Report ======
Line 59: Line 86:
 
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.
 
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.
  
 
==Results==
 
 
==Links==
 
  
 
[[Category:Deep Learning Acceleration]]
 
[[Category:Deep Learning Acceleration]]
Line 74: Line 97:
  
 
[[#top|↑ top]]
 
[[#top|↑ top]]
<!--
 
 
COPY PASTE FROM THE LIST BELOW TO ADD TO CATEGORIES
 
 
GROUP
 
[[Category:IIP]]
 
      [[Category:cat1]]
 
      [[Category:cat2]]
 
      [[Category:cat3]]
 
      [[Category:cat4]]
 
      [[Category:cat5]]
 
 
 
[[Category:Digital]]
 
    SUB CATEGORIES
 
    NEW CATEGORIES
 
      [[Category:Computer Architecture]]
 
      [[Category:Acceleration and Transprecision]]
 
      [[Category:Heterogeneous Acceleration Systems]]
 
      [[Category:Event-Driven Computing]]
 
      [[Category:Predictable Execution]]
 
      [[Category:SmartSensors]]
 
      [[Category:Transient Computing]]
 
      [[Category:System on Chips for IoTs]]
 
      [[Category:Energy Efficient Autonomous UAVs]]
 
      [[Category:Biomedical System on Chips]]
 
      [[Category:Digital Medical Ultrasound Imaging]]
 
      [[Category:Cryptography]]
 
      [[Category:Deep Learning Acceleration]]
 
      [[Category:Hyperdimensional Computing]]   
 
 
      [[Category:Competition]]   
 
      [[Category:EmbeddedAI]]   
 
 
 
    [[Category:ASIC]]
 
    [[Category:FPGA]]
 
   
 
    [[Category:System Design]]
 
    [[Category:Processor]]
 
    [[Category:Telecommunications]]
 
    [[Category:Modelling]]
 
    [[Category:Software]]
 
    [[Category:Audio]]
 
 
[[Category:Analog]]
 
[[Category:Nano-TCAD]]
 
 
[[Category:AnalogInt]]
 
  SUB CATEGORIES
 
  [[Category:Telecommunications]]
 
 
 
STATUS
 
[[Category:Available]]
 
[[Category:In progress]]
 
[[Category:Completed]]
 
[[Category:Hot]]
 
 
TYPE OF WORK
 
[[Category:Group Work]]
 
[[Category:Semester Thesis]]
 
[[Category:Master Thesis]]
 
[[Category:PhD Thesis]]
 
[[Category:Research]]
 
 
NAMES OF EU/CTI/NT PROJECTS
 
[[Category:Oprecomp]]
 
[[Category:Antarex]]
 
[[Category:Hercules]]
 
[[Category:Icarium]]
 
[[Category:PULP]]
 
[[Category:ArmaSuisse]]
 
[[Category:Mnemosene]]
 
[[Category:Aloha]]
 
[[Category:Ampere]]
 
[[Category:ExaNode]]
 
[[Category:EPI]]
 
[[Category:Fractal]]
 
 
 
YEAR (IF FINISHED)
 
[[Category:2010]]
 
[[Category:2011]]
 
[[Category:2012]]
 
[[Category:2013]]
 
[[Category:2014]]
 
[[Category:2015]]
 
[[Category:2016]]
 
[[Category:2017]]
 
[[Category:2018]]
 
[[Category:2019]]
 
[[Category:2020]]
 
 
 
--->
 

Revision as of 16:45, 19 November 2021

RBE architecture
BBQ computational concept

Short Description

We have recently designed an accelerator called Reconfigurable Binary Engine (RBE)[1]. The RBE architecture exploits two computational concepts, explained below as Binary Based Quantization (BBQ). BBQ allows the RBE to perform convolutions with configurable arithmetic precisions in a flexible and power-scalable way. In this project, you will make us of our in-house developed frameworks NEMO [2] (or Quantlab [3]) and DORY[4,5] to map networks onto the RBE accelerator and evaluate their performance and energy efficiency for real networks.

Computational Concept: BBQ - Binary Based Quantization (BBQ)

RBE aims to have a freely configurable accuracy allowing to balance the power and performance vs accuracy tradeoff. The design is inspired by the ABC-Net [6] which is based on the following two innovations:

  1. Linear combination of multiple binary weight bases.
  2. Employing multiple binary activations to alleviate the information loss.

The RBE architecture uses the two innovations to emulate quantized NNs by choosing the binary weights to correspond to each bit of the quantized weights. One quantized NN can therefore be emulated by a superposition of power-of-2 weighted QA×QW binary NN, whereas QW corresponds to the quantization level of the weights and QA quantization level of the activations. We call this concept from now on Binary Based Quantization (BBQ) which allows the RBE to perform convolutions with configurable arithmetic precisions in a flexible and power-scalable way. BBQ can be applied on both complete NNs and single layers.

Architecture

The RBE accelerator consists out of three parts:

  • Control Unit - contains all control related logic:
    • Whole tensor tile is handled in a single job, helped by HWPE uloop (tiny microcoded loop processor)
    • Classic HWPE programming interface + hardwired controller
  • Streamer Unit - handles all request
    • Source - includes the address and request generation for reading data from the TCDM memory
    • Sink - includes the address and request generation for writing data back to the TCDM memory
  • Engine Unit - performs all computation. The unit includes the following modules:
    • A grid of 9x9=81 Block units (9 columns of each 9 Block units)
    • Each Block includes 4 Binary Convolution Engines, or short Binconv, modules
    • Each Binconv performs a QW x 1bit 32x32 Matrix-Vector product in QW x 32 cycles (32 bMAC/cycle)
    • The reduced Binconv results are scaled by a power-of-two and accumulated
    • The accumulated results of all block in one of the 9 Columns are again accumulated and stored in the Accumulator Banks
    • After the full accumulation, the values are quantized by the Quantization module and streamed out


Literature

  • [1] RBE Github -> have a look at the documentation
  • [2] Dory Github
  • [3] Dory Examples Github
  • [4] Nemo Github
  • [5] Quantlab Github
  • [6] X. Lin, C. Zhao and W. Pan. "Towards Accurate Binary Convolutional Neural Network." Advances in Neural Information Processing Systems, 2017.

Status: Available

Prerequisites

  • VLSI I
  • C coding
  • python coding (optimal: Pytorch)

Character

  • 20% Theory
  • 20% HW understanding
  • 40% ML Tools: Nemo, Dory, Pytorch
  • 20% Embedded C programming

Professor

Project Organization

Weekly Meetings

The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.

Report / Presentation

Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.

Final Report

A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.

Presentation

At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.↑ top