Personal tools

Difference between revisions of "Streaming Integer Extensions for Snitch (M)"

From iis-projects

Jump to: navigation, search
(Created page with "<!-- Universal Stream Semantic Registers for Snitch (1S) --> Category:Digital Category:High Performance SoCs Category:Computer Architecture Category:Acceleratio...")
 
(References)
Line 69: Line 69:
 
= References =
 
= References =
  
[1] https://ieeexplore.ieee.org/document/9068465
+
[1] https://ieeexplore.ieee.org/document/9216552
  
 
[2] https://ieeexplore.ieee.org/document/9068465
 
[2] https://ieeexplore.ieee.org/document/9068465

Revision as of 18:01, 17 November 2021


Overview

Status: Available

Introduction

The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine.

Currently, Snitch’s floating-point subsystem is of particular interest: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which together enable almost continuous FPU utilization in many data-oblivious problems.

Recently, we explored two new accelerator-based extensions for Snitch, both of which aim to boost performance and energy efficiency in integer-based workloads such as signal processing and low-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.

Ideally, we would like to have one unified, mature approach with to integer workload acceleration in our mainline version of Snitch. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.

Project

  • Integrate the current partial Xpulpv2 implementation [3] in the mainline Snitch version. This will require you to
    • Adapt to the changes in the Snitch codebase and parameterize the existing code
    • Possibly switch to a standardized accelerator interface such as X-interface
    • Verify the functionality of your extensions.
  • Implement parametric support for integer SSRs which
    • Are shared between floating-point and integer datapaths when both are available
    • Support configurable datawidths (8, 16, 32, 64 bit).
  • Implement additional instructions of interest, which could include
    • A complete implementation of Xpulp or a closed subset of partitions
    • The proposed draft Bitmanip extension [4]
    • A simple integer hardware loop.
  • Evaluate your extensions by
    • Determining the performance impact on representative integer workloads
    • Determining the area and timing impact in synthesis
    • Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [5].

Character

  • 20% Literature / architecture review
  • 40% RTL implementation
  • 20% Bare-metal C programming
  • 20% Evaluation

Prerequisites

  • Strong interest in computer architecture and memory systems
  • Experience with digital design in SystemVerilog as taught in VLSI I
  • Experience with ASIC implementation flow (synthesis) as taught in VLSI II
  • SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent
  • Preferred: Knowledge or prior experience with RISC-V or ISA extension design

References

[1] https://ieeexplore.ieee.org/document/9216552

[2] https://ieeexplore.ieee.org/document/9068465

[3] https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv

[4] https://github.com/riscv/riscv-bitmanip/

[5] https://ieeexplore.ieee.org/abstract/document/9406333

[6] https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions/

[7] https://github.com/openhwgroup/cv32e40p