Low Precision Ara for ML

Introduction

Vector processing is becoming a widespread option when dealing with highly parallel data workloads, thanks to its intrinsic computational capabilities and flexibility inherited from the Cray-1 processor. For example, “FUGAKU”, the most performant supercomputer in the world, is a vector processor!

A vector core can sustain high computational throughput using deep pipelines and multiple parallel units and, unlike standard SIMD architectures, can adjust the vector length at runtime without the need for new ISA instructions for different specific vector lengths.

What a time for a project on a vector processor! RISC-V has almost finished ratifying its open-source vector ISA RVV (a process that lasted many years!), and many industries/universities are producing their first RVV-compatible cores.

Ara currently supports floating-point arithmetic from 64 bits down to 16 bits. Nevertheless, common machine learning tasks can be run using low-precision floating-point data formats with acceptable accuracy and improved performance and energy efficiency.

Project: Low-Precision Ara for ML

This project aims to extend Ara with the support for low-precision floating-point arithmetics.

Throughout the project, you will implement the support for three new low-precision floating-point formats: 16-bit alt, 8-bit, and 8-bit alt. Each new format requires RTL modifications with the design of new datapath and control logic, verification, and benchmarking. The vector instructions should guarantee high throughput in real conditions, and the RTL modifications should not degrade the target frequency/efficiency.

Therefore, during the project, you will not only modify the hardware but also synthesize and physically implement it with state-of-the-art software tools to check that the timing requirements and the PPA figures of merit are satisfying. Moreover, you will analyze and implement suitable vector benchmarks to track the performance of the added modifications.

This is an ambitious (but super rewarding) project that requires strong HW and SW skills. It is required to study and understand Ara, adding and testing at least a complete set of new instructions.

If the project develops quickly enough, it is easily possible to stretch it by extending the benchmark pool with new ML kernels. On the other hand, if the project proceeds more slowly than expected, only one or two new formats will be implemented.

Requirements

Strong interest in computer architecture, both on the HW and SW sides
Experience with SystemVerilog HDL, such as taught in VLSI I
Knowledge of ASIC tool flow (Synthesis + PnR), or parallel enrollment with VLSI II
C programming language
Bonus: being familiar with vector processors, RISC-V RVV

Composition: 20% Study + Architecture specification, 40% RTL implementation, 40% Verification and Benchmarking

What will you learn

During the project, you will develop several skills.

Understand how a Vector architecture works.
Know Ara, our workhorse for vector processing.
Learn how RTL modifications reflect on the post-layout PPA metrics.
Learn how to deal with a complex design and environment.

Project Supervisors

Matteo Perotti: [1]

References

[1] Ara: https://arxiv.org/pdf/1906.00478.pdf

[2] Ara source code: https://github.com/pulp-platform/ara

[3] Cray-Processor: http://www.edwardbosworth.com/My5155_Slides/Chapter13/Cray_Supercomputers.htm

[4] RVV: https://github.com/riscv/riscv-v-spec/releases/tag/v1.0

Personal tools

Low Precision Ara for ML - iis-projects

Search

Navigation

Tools