Latest revision as of 09:49, 17 August 2022

Overview

Status: Completed

Type: Semester Thesis
Professor: Prof. Dr. L. Benini
Supervisors:

Introduction

The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine.

Currently, Snitch’s floating-point subsystem is of particular interest: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which together enable almost continuous FPU utilization in many data-oblivious problems.

Over time, we have written many simple demonstrator programs for Snitch systems to measure their performance. Most of these involved computational kernels, small computational functions like linear algebra operations, convolutions, or FFTs; these are frequently called in larger compute-intensive applications like machine learning layers or mathematical problem solvers.

Since a lot compute time on is spent in these kernels, optimizing them for the target hardware is a highly effective way to accelerate computation. Thus, most existing Snitch kernels are hand-tunded, partially or completely written in assembly, and use Snitch's extensions for maximum performance and efficiency.

Unfortunately, we do not have many compute kernels for Snitch yet, and much of the existing code was written for old versions of Snitch and is no longer maintained; it uses various code conventions, targets outdated versions of our extensions, and/or no longer performs optimally on our hardware. It is also scattered across different projects and repositories.

Project

In this project, you will create a unified library of high-performance computational kernels tailored to Snitch and its extensions for use in compute-intensive applications. To this end, you will:

Review and get familiar with existing efforts on
- Snitch compute kernels and runtime
- Compute libraries targeting PULP (PULP-NN [3], PULP DSP [4])
Define the structure and requirements for a compute kernel library
Write new compute kernels, which may include any of:
- Linear algebra (matrices/vectors/scalar sums, products, transpositions, inversions...)
- Machine learning (pooling, batch normalization, backpropagation, ...)
- Filter functions (convolution, FFT, ...)
- Complex numbers (addition, multiplication, magnitude and argument, ...)
Verify your new kernels using results generated by common compute libraries
Evaluate the performance of your kernels in RTL simulations of a Snitch system

Depending on your preferences and prior experience, you may choose which class(es) of kernels you want to tackle or focus on. The proposal can also be split into multiple individual projects if necessary.

Character

20% Literature / architecture review
40% RTL implementation
20% Bare-metal C programming
20% Evaluation

Prerequisites

Strong interest in computer architecture and memory systems
Experience with digital design in SystemVerilog as taught in VLSI I
Experience with ASIC implementation flow (synthesis) as taught in VLSI II
SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent
Preferred: Knowledge or prior experience with RISC-V or ISA extension design

References

[1] https://ieeexplore.ieee.org/document/9216552

[2] https://ieeexplore.ieee.org/document/9068465

[3] https://github.com/pulp-platform/pulp-nn

[4] https://github.com/pulp-platform/pulp-dsp

Personal tools

Difference between revisions of "A Unified Compute Kernel Library for Snitch (1-2S)" - iis-projects

Search

Navigation

Tools

Difference between revisions of "A Unified Compute Kernel Library for Snitch (1-2S)"

From iis-projects