Personal tools

Difference between revisions of "Streaming Integer Extensions for Snitch (M)"

From iis-projects

Jump to: navigation, search
m (Introduction)
m (Project)
Line 42: Line 42:
 
* '''Implement parametric support for integer SSRs''' which  
 
* '''Implement parametric support for integer SSRs''' which  
 
** Are shared between floating-point and integer datapaths when both are available
 
** Are shared between floating-point and integer datapaths when both are available
** Support configurable datawidths (8, 16, 32, 64 bit).
+
** Support configurable datawidths (8, 16, 32, 64 bits).
 
* '''Implement additional instructions of interest''', which could include
 
* '''Implement additional instructions of interest''', which could include
 
** A complete implementation of Xpulp [5] or a closed subset of its partitions
 
** A complete implementation of Xpulp [5] or a closed subset of its partitions

Revision as of 18:35, 17 November 2021


Overview

Status: Available

Introduction

The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 gate equivalents in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine.

Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.

Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of integer-based workloads such as signal processing and low-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.

Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer units utilization as the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.

Project

  • Integrate the current partial Xpulpv2 implementation [3][4] in the mainline Snitch version. This will require you to
    • Adapt to the changes in the mainline Snitch codebase and parameterize the existing code
    • Possibly switch to a standardized accelerator interface such as X-interface
    • Verify the functionality of your extensions.
  • Implement parametric support for integer SSRs which
    • Are shared between floating-point and integer datapaths when both are available
    • Support configurable datawidths (8, 16, 32, 64 bits).
  • Implement additional instructions of interest, which could include
    • A complete implementation of Xpulp [5] or a closed subset of its partitions
    • The proposed draft Bitmanip extension [6]
    • A simple integer hardware loop [5].
  • Evaluate your extensions by
    • Determining the performance impact on representative integer workloads
    • Determining the area and timing impact in synthesis
    • Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].

Character

  • 20% Literature / architecture review
  • 40% RTL implementation
  • 20% Bare-metal C programming
  • 20% Evaluation

Prerequisites

  • Strong interest in computer architecture and memory systems
  • Experience with digital design in SystemVerilog as taught in VLSI I
  • Experience with ASIC implementation flow (synthesis) as taught in VLSI II
  • SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent
  • Preferred: Knowledge or prior experience with RISC-V or ISA extension design

References

[1] https://ieeexplore.ieee.org/document/9216552

[2] https://ieeexplore.ieee.org/document/9068465

[3] https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M)

[4] https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv

[5] https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions

[6] https://github.com/riscv/riscv-bitmanip

[7] https://ieeexplore.ieee.org/abstract/document/9406333

[8] https://github.com/openhwgroup/cv32e40p