Personal tools

Difference between revisions of "Accelerating Stencil Workloads on Snitch using ISSRs (1-2S/B)"

From iis-projects

Jump to: navigation, search
 
Line 4: Line 4:
 
[[Category:High Performance SoCs]]
 
[[Category:High Performance SoCs]]
 
[[Category:Computer Architecture]]
 
[[Category:Computer Architecture]]
[[Category:2022]]
+
[[Category:2023]]
 
[[Category:Semester Thesis]]
 
[[Category:Semester Thesis]]
 
[[Category:Master Thesis]]
 
[[Category:Master Thesis]]

Latest revision as of 21:29, 19 January 2023

Some examples of (regular) stencil regions

Overview

Status: Available

Introduction

Stencil codes are algorithms which iteratively process data on n-dimensional grids by accessing arrays in fixed, possibly irregular patterns relative to each grid point [1]. They are widespread in high-performance computing (HPC) and underly various problems in physical simulation, economics, and image processing among other domains.

We recently evaluated the performance of a few stencil kernels on our Snitch cluster [2], which is designed for energy-efficient HPC. For this purpose, it includes a few intruction set extensions [3, 4] which enable floating-point unit (FPU) utilizations approaching 100%.

We found that a recent extension to Snitch, indirection stream semantic registers (ISSRs) [4], are highly effective in accelerating stencil codes. ISSRs load a predefined sequence of elements from a high-bandwidth scratchpad memory directly into a processor register as it is being used by an instruction, enabling very high FPU utilizations for arbitrary stencil shapes.

Project

In this project, you will extend our evaluation on accelerating stencils with ISSRs to a larger number of stencils from various sources. These may include

  • Benchmark suites like PolyBench [5] or Rodinia [6].
  • Stencil benchmark collections like that of MeteoSwiss [7].
  • Example stencils for stencil code generators like StencilFlow [8] or AN5D [9]

The goal is to port a representative subset of stencil kernels to Snitch, accelerate them using ISSRs, and evaluate the performance benefits. Motivated students may also work towards creating a stencil code generator generating Snitch code from high-level stencil descriptions similar to the generators mentioned above.

Character

  • 25% Literature / Architecture review
  • 50% Bare-metal C and Assembly programming
  • 25% Performance evaluation

Prerequisites

  • Knowledge of bare-metal C and assembly programming
  • Strong interest in computer architecture
  • Preferred: Knowledge of or prior experience with RISC-V Assembly and programming ISA extensions
  • Preferred: Prior experience with high-performance and/or numerical computing

References

[1] https://en.wikipedia.org/wiki/Iterative_Stencil_Loops

[2] https://ieeexplore.ieee.org/document/9216552, https://github.com/pulp-platform/snitch

[3] https://ieeexplore.ieee.org/document/9068465

[4] https://ieeexplore.ieee.org/document/9474230

[5] https://github.com/MatthiasJReisinger/PolyBenchC-4.2.1

[6] https://www.cs.virginia.edu/rodinia/doku.php

[7] https://github.com/MeteoSwiss-APN/stencil_benchmarks

[8] https://www.computer.org/csdl/proceedings-article/cgo/2021/09370315/1rSR4s1zlUA

[9] https://dl.acm.org/doi/abs/10.1145/3368826.3377904