New RVV 1.0 Vector Instructions for Ara
Vector processing is becoming a widespread option when dealing with highly parallel data workloads, thanks to its intrinsic computational capabilities and flexibility inherited from the Cray-1 processor. For example, “FUGAKU”, the most performant supercomputer in the world, is a vector processor!
A vector core can sustain high computational throughput using deep pipelines and multiple parallel units and, unlike standard SIMD architectures, can adjust the vector length at runtime without the need for new ISA instructions for different specific vector lengths.
What a time for a project on a vector processor! RISC-V has almost finished ratifying its open-source vector ISA RVV (a process that lasted many years!), and many industries/universities are producing their first RVV-compatible cores.
ETH is at the forefront of this race with its agile in-order vector processor Ara, fresh from an update from the unripe specifications RVV 0.5. Still, some critical instructions are missing to claim FULL compliance to RVV 1.0 and run every vector program produced by an RVV compiler.
Project: Add New RVV 1.0 Features to Ara
This project aims at extending Ara with the instructions that are still missing, bringing it to be fully compliant with RVV 1.0.
Some important missing features are:
- Vector to Scalar moves.
- Floating-Point reductions.
- Vector gather/compress.
- Specific mask manipulations.
- Segment memory operations.
Throughout the project, you will implement a subset of these missing instructions into Ara. Each new feature requires RTL modifications with the design of new datapath and control logic, extensive verification, and benchmarking. The vector instructions should guarantee high throughput in real conditions, and the RTL modifications should not degrade the target frequency.
Therefore, during the project, you will not only modify the hardware but also synthesize and physically implement it with state-of-the-art software tools to check that the timing requirements and the PPA figures of merit are satisfying. Moreover, you will analyze and implement suitable vector benchmarks to track the performance of the added modifications.
This is an ambitious (but super rewarding) project that requires strong HW and SW skills. It is required to study and understand Ara, adding and testing at least a complete set of new instructions.
If the project develops fastly enough, it is easily possible to stretch it by adding another set of vector instructions. On the other hand, if the project proceeds more slowly than expected, a subset of the instructions can be implemented and verified instead.
If more students are interested, it is possible to work in tandem and add more missing features.
- Strong interest in computer architecture, both on the HW and SW sides
- Experience with SystemVerilog HDL, such as taught in VLSI I
- Knowledge of ASIC tool flow (Synthesis + PnR), or parallel enrollment with VLSI II
- C programming language
- Bonus: being familiar with vector processors, RISC-V RVV
Composition: 30% Study + Architecture specification, 30% RTL implementation, 40% Verification and Benchmarking
What will you learn
During the project, you will develop several skills.
- Understand how a Vector architecture works.
- Know Ara, our workhorse for vector processing.
- Learn how RTL modifications reflect on the post-layout PPA metrics.
- Learn how to deal with a complex design and environment.
 Ara: https://arxiv.org/pdf/1906.00478.pdf
 Ara source code: https://github.com/pulp-platform/ara
 Cray-Processor: http://www.edwardbosworth.com/My5155_Slides/Chapter13/Cray_Supercomputers.htm