Streaming Integer Extensions for Snitch (M/1-2S)
Status: In Progress
- Type: Master or Semester Thesis
- Professor: Prof. Dr. L. Benini
The Snitch ecosystem  targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 gate equivalents in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine.
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs)  and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.
Recently, we explored two new accelerator-based extensions for Snitch , both of which aim to boost performance and energy efficiency of integer-based workloads such as signal processing and low- and mixed-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer unit utilization as in the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.
A main goal of this thesis is to create a Snitch-based system which is suitable for the type of embedded/edge applications targeted by the RI5CY core. Comparing the fundamentally different approaches to maximizing (integer) compute throughput in terms of performance, flexibility and energy efficiency will yield valuable insights and influence the direction of future embedded MCU development at IIS.
- Integrate the current partial Xpulpv2 implementation  in the mainline Snitch version. This will require you to
- Adapt to the changes in the mainline Snitch codebase and parameterize the existing code
- Possibly switch to a standardized accelerator interface such as X-interface
- Verify the functionality of your extensions.
- Implement parametric support for integer SSRs which
- Are shared between floating-point and integer datapaths when both are available
- Support configurable datawidths (8, 16, 32, 64 bits).
- Implement additional instructions of interest, which could include
- A complete implementation of Xpulp  or a closed subset of its partitions
- The proposed draft Bitmanip extension 
- A simple integer hardware loop .
- Evaluate your extensions by
- Determining the performance impact on representative integer workloads
- Determining the area and timing impact in synthesis
- Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions .
- 20% Literature / architecture review
- 40% RTL implementation
- 20% Bare-metal C programming
- 20% Evaluation
- Strong interest in computer architecture and memory systems
- Experience with digital design in SystemVerilog as taught in VLSI I
- Experience with ASIC implementation flow (synthesis) as taught in VLSI II
- SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent
- Preferred: Knowledge or prior experience with RISC-V or ISA extension design
 ISA extensions in the Snitch Processor for Signal Processing (M) (Previous Master thesis project)
 Snitch IPU accelerator in the MemPool many-core system (GitHub repository)
 RISC-V Bit Manipulation draft specification (GitHub repository)
 OpenHW Group CORE-V CV32E40P RISC-V IP (GitHub repository)