Personal tools

Difference between revisions of "Augmenting Our IPs with AXI Stream Extensions (M/1-2S)"

From iis-projects

Jump to: navigation, search
m
Line 9: Line 9:
 
[[Category:Paulsc]]
 
[[Category:Paulsc]]
 
[[Category:Tbenz]]
 
[[Category:Tbenz]]
[[Category:Reserved]]
+
[[Category:In progress]]
  
  

Revision as of 10:50, 15 February 2022


Overview

Status: Reserved

Introduction

General-purpose processors often access data as memory streams, or sequences of memory requests following a predefined address pattern. Recent architectural extensions [1,2] propose handling such streams in hardware, which brings many benefits: it frees processors from explicitly computing addresses and issuing requests, increasing compute throughput. It also decouples data movement from execution, hiding architectural latencies and maximizing bandwidth utilization.

We recently began exploring how to leverage the benefits of memory streams in large Systems on Chip (SoCs) by propagating their semantics (address pattern information such as loop bounds and strides) throughout the memory system. To this end, we are currently extending the AXI4 [3] memory protocol, used in many of our IPs, to support affine (strided) and indirect streams, the most common stream types in real-world applications [2].

Now, we would like to make use of this extended protocol in our existing AXI4 IPs to improve their performance, as well as fully quantify its benefits. We would also like to demonstrate the extensions in a full demonstrator SoC.

Project

In this project, you will extend some of our existing core, interconnect, and memory IPs to properly handle our AXI4 stream extensions. We will first focus on

  • Our existing AXI4 interconnect IP suite [4] (crossbars, buffers, converters, serializers, ...)
  • Our universal Direct Memory Access (DMA) engine
  • Our banked AXI4 on-chip memories

Depending on the remaining time and your personal interests, further IPs can extended and investigated, for example:

  • Our vector processor Ara [5]
  • Our last-level and read-only caches
  • Our AXI4 off-chip serial link

A simple demonstrator system building on your extended IPs could also be built.

Character

  • 20% Architecture & spec review
  • 40% RTL implementation
  • 20% Verification
  • 20% Evaluation

Prerequisites

  • Strong interest in computer architecture and memory systems
  • Experience with digital design in SystemVerilog as taught in VLSI I
  • Experience with ASIC implementation flow (synthesis) or parallel enrolment in VLSI II
  • Preferred: SoCs for Data Analytics and ML and/or Computer Architecture lectures
  • Preferred: Knowledge or experience with AXI and RISC-V

References

[1] https://ieeexplore.ieee.org/document/9068465

[2] https://ieeexplore.ieee.org/document/8980305

[3] https://developer.arm.com/documentation/ihi0022/hc

[4] https://github.com/pulp-platform/axi

[5] https://ieeexplore.ieee.org/document/8918510