Personal tools

Difference between revisions of "Enabling Efficient Systolic Execution on MemPool (M)"

From iis-projects

Jump to: navigation, search
(Created page with "<!-- Enabling Efficient Systolic Execution on MemPool (M) --> = Overview = == Status: Completed == * Student: Vaibhav Krishna * Semester: Fall Semester 2022 * Type: Master...")
 
(Overview)
 
(4 intermediate revisions by 2 users not shown)
Line 40: Line 40:
 
= Introduction =
 
= Introduction =
  
WIP
+
Striving for high image quality, even on mobile devices, has lead to an explosion in the pixel count of smartphone cameras over the last decade. These image sensors, boasting tens of millions of pixels, create a massive amount of data to be processed on a tight power envelope as quickly as possible. Computational photography, computer vision, augmented reality, and machine learning are only few of the possible applications.
  
 +
At ETH, we are developing our own ISP called MemPool. It boasts 256 area-optimized 32-bit Snitch. Snitch, developed at ETH as well, implements the RISC-V, which is a open targeting modularity and scalability. Despite its size, MemPool gives all 256 cores low-latency access to the shared L1 memory, with a maximum latency of only five cycles when no contention occurs. This implements efficient communication among all cores, making MemPool suitable for various workload domains and easy to program.
 +
 +
In its latest developments, MemPool supports a systolic mode, where communication among, i.e. the Snitch cores, is handled via systolic queues. A core can allocate a systolic queue data structure in the shared memory, and issue push and pop operations to move data across different queues. This communication mechanism is made efficient by MemPool's shared L1 memory and by specialized hardware extensions accelerating push and pop operations from the memory-mapped queues.
  
 
= Project Description =
 
= Project Description =
  
* A
+
This thesis' goal is to further investigate MemPool's systolic configuration and point out sources of energy and performance inefficiencies in the systolic execution. Approaches to fix the inefficiencies will be investigated and implemented.
** a
 
* '''Add the complete xpulp set'''
 
** B
 
 
 
= Project Realization =
 
 
 
== Meetings ==
 
 
 
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Beside these regular meetings, additional meetings can be organized to address urgent issues as well.
 
 
 
== Weekly Reports ==
 
 
 
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.
 
 
 
== Coding Guidelines ==
 
 
 
==== HDL Code Style ====
 
 
 
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.
 
 
 
==== Software Code Style ====
 
 
 
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.
 
 
 
==== Version Control ====
 
 
 
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.
 
 
 
== Report ==
 
 
 
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.
 
 
 
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.
 
 
 
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.
 
 
 
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.
 
 
 
==== Final Report ====
 
 
 
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.
 
 
 
== Presentation ==
 
 
 
There will be a presentation 15 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.
 
 
 
= Deliverables =
 
 
 
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:
 
 
 
* Final report incl. presentation slides
 
* Source code and documentation for all developed software and hardware
 
* Testsuites (software) and testbenches (hardware)
 
* Synthesis and implementation scripts, results, and reports
 
  
 
= References =
 
= References =
 
[[#ref-Riedel2021|&#91;3&#93;]]
 
 
<div id="refs" class="references csl-bib-body">
 
 
<div id="ref-Banshee2021" class="csl-entry">
 
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">PULP Team, <span>“<span>Banshee GitHub (https://github.com/pulp-platform/snitch/tree/master/sw/banshee)</span>.”</span> 2021.</span>
 
</div>
 
  
 
<div id="ref-Cavalcante2020" class="csl-entry">
 
<div id="ref-Cavalcante2020" class="csl-entry">
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“<span>MemPool</span>: A shared-<span>L1</span> memory many-core cluster with a low-latency interconnect,”</span> in ''2021 design, automation, and test in europe conference and exhibition (DATE)'', 2021, pp. 701–706.</span>
+
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“<span>MemPool</span>: A shared-<span>L1</span> memory many-core cluster with a low-latency interconnect,”</span> in ''2021 design, automation, and test in europe conference and exhibition (DATE)'', 2021, pp. 701–706.</span>
</div>
 
 
 
</div>
 

Latest revision as of 10:39, 2 November 2023


Overview

Status: Completed

Character

  • 15% Literature/architecture review
  • 35% RTL implementation
  • 30% Evaluation
  • 20% Bare-metal C programming

Prerequisites

  • Experience with RTL design and evaluation
  • Experience with C

Introduction

Striving for high image quality, even on mobile devices, has lead to an explosion in the pixel count of smartphone cameras over the last decade. These image sensors, boasting tens of millions of pixels, create a massive amount of data to be processed on a tight power envelope as quickly as possible. Computational photography, computer vision, augmented reality, and machine learning are only few of the possible applications.

At ETH, we are developing our own ISP called MemPool. It boasts 256 area-optimized 32-bit Snitch. Snitch, developed at ETH as well, implements the RISC-V, which is a open targeting modularity and scalability. Despite its size, MemPool gives all 256 cores low-latency access to the shared L1 memory, with a maximum latency of only five cycles when no contention occurs. This implements efficient communication among all cores, making MemPool suitable for various workload domains and easy to program.

In its latest developments, MemPool supports a systolic mode, where communication among, i.e. the Snitch cores, is handled via systolic queues. A core can allocate a systolic queue data structure in the shared memory, and issue push and pop operations to move data across different queues. This communication mechanism is made efficient by MemPool's shared L1 memory and by specialized hardware extensions accelerating push and pop operations from the memory-mapped queues.

Project Description

This thesis' goal is to further investigate MemPool's systolic configuration and point out sources of energy and performance inefficiencies in the systolic execution. Approaches to fix the inefficiencies will be investigated and implemented.

References

[1] M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, MemPool: A shared-L1 memory many-core cluster with a low-latency interconnect,” in 2021 design, automation, and test in europe conference and exhibition (DATE), 2021, pp. 701–706.