Revision as of 11:27, 10 August 2021

Overview

Status: Available

Type: Semester Thesis
Professor: Prof. Dr. L. Benini
Supervisors:
- Paul Scheffler: paulsc@iis.ee.ethz.ch
- Thomas Benz: tbenz@iis.ee.ethz.ch

Introduction

Processors often access data as memory streams, sequences of memory requests following predefined address patterns. Recent architectural extensions [1-2] propose handling such streams in hardware. This frees processors from explicitly computing addresses and issuing requests, increasing compute throughput. It also decouples data movement from execution, hiding architectural latencies and maximizing bandwidth utilization.

In our group, we developed Stream Semantic Registers (SSRs) [1]. These map memory streams directly to general-purpose registers in a RISC-V core, such that simply accessing a register loads or stores data. The stream's addresses are computed by an address generator, which is programmed with the stream's address pattern (loop bounds, strides, ...) beforehand.

SSRs are used in the Snitch cluster [3] along with the floating point repetition (FREP) hardware loop; this enables floating-point unit (FPU) utilizations near 100% on regular problems. In this context, we recently extended SSRs to also handle indirect streams [4] for sparse workloads, and are actively working on further extensions.

However, there is a fundamental limitation to SSRs as currently implemented in Snitch systems: they only support streaming double-precision (64-bit) floating-point data. Adding support integer types and different element sizes (8, 16, 32, 64 bit) would enable accelerating many more scenarios, such as graph processing.

Project

In this project, we want to:

Extend SSRs to support variably-sized types for stream elements.
Extend the work-in-progress Snitch integer processing unit (IPU) to support integer SSRs.
Write simple programs (e.g. linear algebra, graph algorithm kernels) demonstrating the use of integer and variable-size streams, respectively.
Evaluate the performance, area, energy, and timing impact of these extensions on the above applications.

The project can be simplified, adapted, or extended to suit your needs and wishes.

Character

20% Architecture specification
40% RTL implementation
40% Verification and Evaluation

Prerequisites

Strong interest in computer architecture and/or memory systems
Experience with HDLs (preferably SystemVerilog) as taught in VLSI I
Knowledge of ASIC tool flow or parallel enrollment with VLSI II
Basic knowledge on embedded / bare-metal programming in C

References

[1] https://ieeexplore.ieee.org/document/9068465

[2] https://ieeexplore.ieee.org/document/8980305

[3] https://ieeexplore.ieee.org/document/9216552

[4] https://arxiv.org/abs/2011.08070

@@ Line 25: / Line 25: @@
 = Introduction =
-Processors often access data as *memory streams*, sequences of memory requests following predefined address patterns. Recent architectural extensions [1-2] propose handling such streams in hardware. This frees processors from explicitly computing addresses and issuing requests, increasing compute throughput. It also *decouples* data movement from execution, hiding architectural latencies and maximizing bandwidth utilization.
+Processors often access data as ''memory streams'', sequences of memory requests following predefined address patterns. Recent architectural extensions [1-2] propose handling such streams in hardware. This frees processors from explicitly computing addresses and issuing requests, increasing compute throughput. It also ''decouples'' data movement from execution, hiding architectural latencies and maximizing bandwidth utilization.
 In our group, we developed Stream Semantic Registers (SSRs) [1]. These map memory streams directly to general-purpose registers in a RISC-V core, such that simply accessing a register loads or stores data. The stream's addresses are computed by an address generator, which is programmed with the stream's address pattern (loop bounds, strides, ...) beforehand.
@@ Line 31: / Line 31: @@
 SSRs are used in the Snitch cluster [3] along with the floating point repetition (FREP) hardware loop; this enables floating-point unit (FPU) utilizations near 100% on regular problems. In this context, we recently extended SSRs to also handle indirect streams [4] for sparse workloads, and are actively working on further extensions.
-However, there is a fundamental limitation to SSRs as currently implemented in Snitch systems: they only support streaming double-precision (64-bit) floating-point data. Adding support *integer* types and *different element sizes* (8, 16, 32, 64 bit) would enable accelerating many more scenarios, such as graph processing.
+However, there is a fundamental limitation to SSRs as currently implemented in Snitch systems: they only support streaming double-precision (64-bit) floating-point data. Adding support ''integer'' types and ''different element sizes'' (8, 16, 32, 64 bit) would enable accelerating many more scenarios, such as graph processing.
 = Project =
@@ Line 38: / Line 38: @@
 * Extend SSRs to support variably-sized types for stream elements.
-* Extend the work-in-progress Snitch _integer processing unit_ (IPU) to support integer SSRs.
+* Extend the work-in-progress Snitch ''integer processing unit'' (IPU) to support integer SSRs.
 * Write simple programs (e.g. linear algebra, graph algorithm kernels) demonstrating the use of integer and variable-size streams, respectively.
 * Evaluate the performance, area, energy, and timing impact of these extensions on the above applications.

Personal tools

Difference between revisions of "Universal Stream Semantic Registers for Snitch (1S)" - iis-projects

Search

Navigation

Tools

Difference between revisions of "Universal Stream Semantic Registers for Snitch (1S)"

From iis-projects