Personal tools

Enabling Efficient Systolic Execution on MemPool (M)

From iis-projects

Jump to: navigation, search


Overview

Status: Completed

Character

  • 15% Literature/architecture review
  • 35% RTL implementation
  • 30% Evaluation
  • 20% Bare-metal C programming

Prerequisites

  • Experience with RTL design and evaluation
  • Experience with C

Introduction

Striving for high image quality, even on mobile devices, has lead to an explosion in the pixel count of smartphone cameras over the last decade. These image sensors, boasting tens of millions of pixels, create a massive amount of data to be processed on a tight power envelope as quickly as possible. Computational photography, computer vision, augmented reality, and machine learning are only few of the possible applications.

At ETH, we are developing our own ISP called MemPool. It boasts 256 area-optimized 32-bit Snitch. Snitch, developed at ETH as well, implements the RISC-V, which is a open targeting modularity and scalability. Despite its size, MemPool gives all 256 cores low-latency access to the shared L1 memory, with a maximum latency of only five cycles when no contention occurs. This implements efficient communication among all cores, making MemPool suitable for various workload domains and easy to program.

In its latest developments, MemPool supports a systolic mode, where communication among, i.e. the Snitch cores, is handled via systolic queues. A core can allocate a systolic queue data structure in the shared memory, and issue push and pop operations to move data across different queues. This communication mechanism is made efficient by MemPool's shared L1 memory and by specialized hardware extensions accelerating push and pop operations from the memory-mapped queues.

Project Description

This thesis' goal is to further investigate MemPool's systolic configuration and point out sources of energy and performance inefficiencies in the systolic execution. Approaches to fix the inefficiencies will be investigated and implemented.

References

[1] M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, MemPool: A shared-L1 memory many-core cluster with a low-latency interconnect,” in 2021 design, automation, and test in europe conference and exhibition (DATE), 2021, pp. 701–706.