Personal tools

Difference between revisions of "Accelerating Stencil Workloads on Snitch using ISSRs (1-2S/B)"

From iis-projects

Jump to: navigation, search
(Created page with "<!-- Accelerating Stencil Workloads on Snitch using ISSRs (1-2S/B) --> Category:Digital Category:High Performance SoCs Category:Computer Architecture Category:2...")
 
 
(One intermediate revision by the same user not shown)
Line 11: Line 11:
 
[[Category:Available]]
 
[[Category:Available]]
  
 +
[[File:stencils.png | thumb | Some examples of (regular) stencil regions]]
  
 
= Overview =
 
= Overview =
Line 20: Line 21:
 
* Supervisors:
 
* Supervisors:
 
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]
 
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]
** Luca Colagrande: [mailto:colluca@iis.ee.ethz.ch colluca@iis.ee.ethz.ch]
+
** [[:User:Colluca | Luca Colagrande]]: [mailto:colluca@iis.ee.ethz.ch colluca@iis.ee.ethz.ch]
  
 
= Introduction =
 
= Introduction =

Latest revision as of 13:59, 16 August 2022

Some examples of (regular) stencil regions

Overview

Status: Available

Introduction

Stencil codes are algorithms which iteratively process data on n-dimensional grids by accessing arrays in fixed, possibly irregular patterns relative to each grid point [1]. They are widespread in high-performance computing (HPC) and underly various problems in physical simulation, economics, and image processing among other domains.

We recently evaluated the performance of a few stencil kernels on our Snitch cluster [2], which is designed for energy-efficient HPC. For this purpose, it includes a few intruction set extensions [3, 4] which enable floating-point unit (FPU) utilizations approaching 100%.

We found that a recent extension to Snitch, indirection stream semantic registers (ISSRs) [4], are highly effective in accelerating stencil codes. ISSRs load a predefined sequence of elements from a high-bandwidth scratchpad memory directly into a processor register as it is being used by an instruction, enabling very high FPU utilizations for arbitrary stencil shapes.

Project

In this project, you will extend our evaluation on accelerating stencils with ISSRs to a larger number of stencils from various sources. These may include

  • Benchmark suites like PolyBench [5] or Rodinia [6].
  • Stencil benchmark collections like that of MeteoSwiss [7].
  • Example stencils for stencil code generators like StencilFlow [8] or AN5D [9]

The goal is to port a representative subset of stencil kernels to Snitch, accelerate them using ISSRs, and evaluate the performance benefits. Motivated students may also work towards creating a stencil code generator generating Snitch code from high-level stencil descriptions similar to the generators mentioned above.

Character

  • 25% Literature / Architecture review
  • 50% Bare-metal C and Assembly programming
  • 25% Performance evaluation

Prerequisites

  • Knowledge of bare-metal C and assembly programming
  • Strong interest in computer architecture
  • Preferred: Knowledge of or prior experience with RISC-V Assembly and programming ISA extensions
  • Preferred: Prior experience with high-performance and/or numerical computing

References

[1] https://en.wikipedia.org/wiki/Iterative_Stencil_Loops

[2] https://ieeexplore.ieee.org/document/9216552, https://github.com/pulp-platform/snitch

[3] https://ieeexplore.ieee.org/document/9068465

[4] https://ieeexplore.ieee.org/document/9474230

[5] https://github.com/MatthiasJReisinger/PolyBenchC-4.2.1

[6] https://www.cs.virginia.edu/rodinia/doku.php

[7] https://github.com/MeteoSwiss-APN/stencil_benchmarks

[8] https://www.computer.org/csdl/proceedings-article/cgo/2021/09370315/1rSR4s1zlUA

[9] https://dl.acm.org/doi/abs/10.1145/3368826.3377904