Personal tools

Efficient Memory Stream Handling in RISC-V-based Systems (M/1-2S)

From iis-projects

Revision as of 20:35, 4 October 2021 by Paulsc (talk | contribs)
Jump to: navigation, search


Status: Reserved


Processors often access data as memory streams, sequences of memory requests following predefined address patterns. Recent architectural extensions [1-5] propose handling such streams in hardware. This frees processors from explicitly computing addresses and issuing requests, increasing compute throughput. It also decouples data movement from execution, hiding architectural latencies and maximizing bandwidth utilization.

One example of such an extension are stream semantic registers (SSRs) [1]. These map memory streams directly to general-purpose registers in a core, such that simply accessing a register loads or stores data. The stream's addresses are computed by an address generator, which is programmed with the stream's semantics (address pattern information such as loop bounds and strides) beforehand.

Statistical analysis shows that most streams have simple semantics [2]. This means that their definition could be propagated throughout the memory system to further hide latency, minimize traffic, and improve memory management (prefetching, coalescing, eviction strategies, etc.).

In our group, we extensively use AXI4 [6] as an on-chip memory protocol. AXI4 supports incremental bursts, which enable transferring multiple words of contiguous data with a single request; this means strictly contiguous streams could already be efficiently communicated to the memory systems. However,

  • None of our cores are able to map memory streams directly to AXI4 bursts.
  • AXI4 incremental bursts cannot represent slightly more complex, but widespread stream semantics such as strided, nested loops or indirection.


In this project, we want to:

  • Extend the Load-Store Unit (LSU) of the RISC-V Snitch core with stream semantics.
  • Efficiently map the LSU's stream requests to AXI4 bursts.
  • Extend our AXI4 implementation [7] through its AxUSER signals to support affine and indirect streams in bursts.
  • Evaluate the performance, area, energy, and timing impact of these extensions on a minimal end-to-end memory system (extended core, memory, and any adapters needed) running simple applications with the corresponding burst patterns.

How the streams are consumed inside the core (push/pop instructions or SSRs) as well as how the LSU streams are programmed is left

The project can be simplified, adapted, or extended to suit your needs and wishes. This can involve targeting a DMA engine as an AXI master instead of a core, targeting more complex memory systems, or extending existing AXI IP blocks to support the new burst types.


  • 20% Literature Research
  • 20% Architecture and protocol specification
  • 30% RTL implementation
  • 30% Verification and Evaluation


  • Strong interest in computer architecture and/or memory systems
  • Experience with HDLs (preferably SystemVerilog) as taught in VLSI I
  • Knowledge of ASIC tool flow or parallel enrollment with VLSI II
  • Basic knowledge on embedded / bare-metal programming in C