Personal tools

Difference between revisions of "Integrating Hardware Accelerators into Snitch 1S"

From iis-projects

Jump to: navigation, search
(Created page with "<!-- Integrating Hardware Accelerators into Snitch (1S) --> Category:Digital Category:Acceleration_and_Transprecision Category:High Performance SoCs Category:Co...")
 
 
Line 1: Line 1:
<!-- Integrating Hardware Accelerators into Snitch (1S) -->
+
#REDIRECT [[Integrating_Hardware_Accelerators_into_Snitch_(1S)]]
 
 
[[Category:Digital]]
 
[[Category:Acceleration_and_Transprecision]]
 
[[Category:High Performance SoCs]]
 
[[Category:Computer Architecture]]
 
[[Category:2021]]
 
[[Category:2022]]
 
[[Category:Semester Thesis]]
 
[[Category:Available]]
 
[[Category:Lbertaccini]]
 
[[Category:Prasadar]]
 
 
 
= Overview =
 
 
 
== Status: Available ==
 
 
 
* Type: Semester Thesis
 
* Professor: Prof. Dr. L. Benini
 
* Supervisors:
 
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]
 
** [[:User:Prasadar | Arpan Suravi Prasad ]]: [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]
 
 
 
= Introduction =
 
 
 
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster [1] couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]
 
 
 
[[File:cluster_hwpe.png|thumb|350px|The ''PULP'' cluster including an HWPE [3]]]
 
 
 
The Snitch system [1] targets energy-efficient high-performance computing. It is built around the tiny RISC-V Snitch integer core, coupled with a large double-precision floating-point unit (FPU) optimized for high-performance. Additionally, Snitch features two custom instruction-set-architecture (ISA) extensions, stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which allows the system to achieve FPU utilization above 90%.
 
 
 
With the slowdown of Moore's law, increased functionality and/or lower costs are achieved through domain specialization and heterogeneity. This led to an increased interest in domain-specific accelerators, which provide higher energy efficiency at a lower area cost.
 
 
 
HWPEs [3] are hardware accelerators that share the memory with the general-purposed cores and that are software-programmed by the cores. A plethora of HWPEs have been developed at our group, spanning from machine-learning engines [4] to
 
accelerators targeting memory-bounded workloads [5]. HWPEs have been historically integrated into the PULP cluster, but a Snitch-based system would greatly benefit by supporting these hardware modules.
 
 
 
The goal of this project is to implement the necessary architectural modifications to provide support for hardware processing engines (HWPE) in the Snitch cluster. Finally, some of the HWPEs developed at IIS can be used to evaluate the architectural improvements.
 
 
 
= Project =
 
 
 
* '''Integrate the support for HWPEs in Snitch'''. This will require you to
 
** Investigate how the PULP cluster provide support for HWPE, in particular focusing on how the HWPEs connect to the memory and to the cores that program them
 
** Integrate support for HWPE in Snitch, implementing the necessary modifications
 
** Verify the functionality of your extensions.
 
* '''Evaluate your extensions''' by
 
** Adding one HWPE already developed at IIS to Snitch
 
** Determining the achieved speed-up for some target applications
 
** Determining the area and timing impact in synthesis
 
** Comparing them to the existing PULP cluster enhanced by the same HWPE.
 
 
 
== Character ==
 
 
 
* 15% Literature / architecture review
 
* 40% RTL implementation
 
* 15% Bare-metal C programming
 
* 30% Evaluation
 
 
 
== Prerequisites ==
 
 
 
* Strong interest in computer architecture
 
* Experience with digital design in SystemVerilog as taught in VLSI I
 
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II
 
 
 
= References =
 
 
 
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]
 
 
 
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]
 
 
 
[3] [https://ieeexplore.ieee.org/abstract/document/6868645 He-P2012: Architectural heterogeneity exploration on a scalable many-core platform]
 
 
 
[4] [https://ieeexplore.ieee.org/abstract/document/8412533 XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference]
 
 
 
[5] [https://ieeexplore.ieee.org/abstract/document/9516654 To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters]
 
 
 
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]
 
 
 
===Status: Available ===
 

Latest revision as of 15:29, 19 November 2021