Personal tools

Difference between revisions of "Integrating Hardware Accelerators into Snitch (1S)"

From iis-projects

Jump to: navigation, search
 
Line 8: Line 8:
 
[[Category:2022]]
 
[[Category:2022]]
 
[[Category:Semester Thesis]]
 
[[Category:Semester Thesis]]
[[Category:Available]]
 
 
[[Category:Lbertaccini]]
 
[[Category:Lbertaccini]]
 
[[Category:Prasadar]]
 
[[Category:Prasadar]]
Line 14: Line 13:
 
= Overview =
 
= Overview =
  
== Status: Available ==
+
== Status: Not Available ==
  
 
* Type: Semester Thesis
 
* Type: Semester Thesis
Line 75: Line 74:
  
 
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]
 
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]
 
===Status: Available ===
 

Latest revision as of 12:57, 7 September 2022


Overview

Status: Not Available

Introduction

The Snitch cluster [1] couples tiny RISC-V Snitch cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.
The PULP cluster including an HWPE [3]

The Snitch system [1] targets energy-efficient high-performance computing. It is built around the tiny RISC-V Snitch integer core, coupled with a large double-precision floating-point unit (FPU) optimized for high-performance. Additionally, Snitch features two custom instruction-set-architecture (ISA) extensions, stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which allows the system to achieve FPU utilization above 90%.

With the slowdown of Moore's law, increased functionality and/or lower costs are achieved through domain specialization and heterogeneity. This led to an increased interest in domain-specific accelerators, which provide higher energy efficiency at a lower area cost.

HWPEs [3] are hardware accelerators that share the memory with the general-purposed cores and that are software-programmed by the cores. A plethora of HWPEs have been developed at our group, spanning from machine-learning engines [4] to accelerators targeting memory-bounded workloads [5]. HWPEs have been historically integrated into the PULP cluster, but a Snitch-based system would greatly benefit by supporting these hardware modules.

The goal of this project is to implement the necessary architectural modifications to provide support for hardware processing engines (HWPE) in the Snitch cluster. Finally, some of the HWPEs developed at IIS can be used to evaluate the architectural improvements.

Project

  • Integrate the support for HWPEs in Snitch. This will require you to
    • Investigate how the PULP cluster provide support for HWPE, in particular focusing on how the HWPEs connect to the memory and to the cores that program them
    • Integrate support for HWPE in Snitch, implementing the necessary modifications
    • Verify the functionality of your extensions.
  • Evaluate your extensions by
    • Adding one HWPE already developed at IIS to Snitch
    • Determining the achieved speed-up for some target applications
    • Determining the area and timing impact in synthesis
    • Comparing them to the existing PULP cluster enhanced by the same HWPE.

Character

  • 15% Literature / architecture review
  • 40% RTL implementation
  • 15% Bare-metal C programming
  • 30% Evaluation

Prerequisites

  • Strong interest in computer architecture
  • Experience with digital design in SystemVerilog as taught in VLSI I
  • Experience with ASIC implementation flow (synthesis) as taught in VLSI II

References

[1] Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads

[2] Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores

[3] He-P2012: Architectural heterogeneity exploration on a scalable many-core platform

[4] XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference

[5] To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters

[6] An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics