Universal Stream Semantic Registers for Snitch (1S)
- Type: Semester Thesis
- Professor: Prof. Dr. L. Benini
Processors often access data as *memory streams*, sequences of memory requests following predefined address patterns. Recent architectural extensions [1-2] propose handling such streams in hardware. This frees processors from explicitly computing addresses and issuing requests, increasing compute throughput. It also *decouples* data movement from execution, hiding architectural latencies and maximizing bandwidth utilization.
In our group, we developed Stream Semantic Registers (SSRs) . These map memory streams directly to general-purpose registers in a RISC-V core, such that simply accessing a register loads or stores data. The stream's addresses are computed by an address generator, which is programmed with the stream's address pattern (loop bounds, strides, ...) beforehand.
SSRs are used in the Snitch cluster  along with the floating point repetition (FREP) hardware loop; this enables floating-point unit (FPU) utilizations near 100% on regular problems. In this context, we recently extended SSRs to also handle indirect streams  for sparse workloads, and are actively working on further extensions.
However, there is a fundamental limitation to SSRs as currently implemented in Snitch systems: they only support streaming double-precision (64-bit) floating-point data. Adding support *integer* types and *different element sizes* (8, 16, 32, 64 bit) would enable accelerating many more scenarios, such as graph processing.
In this project, we want to:
- Extend SSRs to support variably-sized types for stream elements.
- Extend the work-in-progress Snitch _integer processing unit_ (IPU) to support integer SSRs.
- Write simple programs (e.g. linear algebra, graph algorithm kernels) demonstrating the use of integer and variable-size streams, respectively.
- Evaluate the performance, area, energy, and timing impact of these extensions on the above applications.
The project can be simplified, adapted, or extended to suit your needs and wishes.
- 20% Architecture specification
- 40% RTL implementation
- 40% Verification and Evaluation
- Strong interest in computer architecture and/or memory systems
- Experience with HDLs (preferably SystemVerilog) as taught in VLSI I
- Knowledge of ASIC tool flow or parallel enrollment with VLSI II
- Basic knowledge on embedded / bare-metal programming in C