Personal tools

Virtual Memory Ara

From iis-projects

Jump to: navigation, search


Introduction

Vector processing is becoming a widespread option when dealing with highly parallel data workloads, thanks to its intrinsic computational capabilities and flexibility inherited from the Cray-1 processor. For example, “FUGAKU”, the most performant supercomputer in the world, is a vector processor!

A vector core can sustain high computational throughput using deep pipelines and multiple parallel units and, unlike standard SIMD architectures, can adjust the vector length at runtime without the need for new ISA instructions for different specific vector lengths.

What a time for a project on a vector processor! RISC-V has almost finished ratifying its open-source vector ISA RVV (a process that lasted many years!), and many industries/universities are producing their first RVV-compatible cores.

ETH is at the forefront of this race with its agile vector processor Ara, fresh from an update to the last specifications RVV 1.0. Ara behaves like a vector accelerator coupled with CVA6, one of the most mature open-source RV64GC cores and now maintained by OpenHW Group. Still, the overall Ara infrastructure runs only in bare-metal mode and is not designed to support an Operating System. This is a shame since the scalar RV64GC core CVA6 does support it!

Running an OS is not straightforward and hides many pitfalls, but it allows for easy porting of many external programs and drastically increases the system's usability.

Project: Vectors on Linux

The first goal of the project is to make Ara support Virtual Memory. This is a key step toward running an OS on the system.

Currently, both CVA6 and Ara have their private load-store units (LSU), but only CVA6 has a Memory Management Unit (MMU). This unit contains various modules (Translation Lookaside Buffer (TLB), Page Table Walker (PTW), Miss Holding Status Registers (MHSR), etc.) and allows for virtual-to-physical address translation. As it seems, it’s a very complex block to design and handle.

One idea for the project is to share this unit with Ara so that both processors can use the same already-existing MMU. This would limit the difficulty of the task and should not harm performance too much for common-case memory operations.

In parallel, supporting an operating system requires proper simulation capabilities. A second key step in the project will be porting the entire Ara system on FPGA and simulating it with a Linux OS without using the vector co-processor.

Once this task is done, we will finally be able to try our system with Linux AND the vector core enabled, benchmark, and optimize it.

Exciting extended goals of the project can be setting up a scheduling strategy for the OS scheduler so that the vector processes (or threads) can co-exist without excessive performance degradation, optimizing the context switch time, and the MMU/TLB accesses to boost the overall performance in real-case scenarios.

Another bonus goal can be to start studying how different kinds of vector memory operations behave when virtual memory is brought into the equation.

Tasks
  • Familiarize yourself with Ara and with how CVA6’s MMU works.
  • Modify the RTL to share the MMU between Ara and CVA6, taking care of the needed synchronization/arbitration between the two.
  • Implement the system with a pre-existing back-end flow to see if there is frequency degradation wrt the original system.
  • Port the system to FPGA.
  • Verify the system (if possible, by using OpenHW Group facilities).
  • Verify the implementation.
  • Benchmark the implementation.
  • Write a report and prepare a presentation.
  • Possible BONUS goals.
Requirements
  • Strong interest and basic knowledge in computer architecture and operating systems, both on the HW and SW sides.
  • Experience with SystemVerilog HDL, such as taught in VLSI I.
  • Basic knowledge of FPGA tools.
  • Basic knowledge of Operating Systems.
  • C programming language.
  • Bonus: Knowledge of ASIC tool flow (Synthesis + PnR), or parallel enrollment with VLSI II.
  • Bonus: being familiar with vector processors, RISC-V RVV.

Composition: 20% Study, 30% RTL implementation, 10% verification strategy, 20% verification, 20% evaluation

What will you learn

During the project, you will develop several skills.

  • Understand how a Vector architecture works.
  • Work with an FPGA flow on a complex project.
  • Work with an OS on open-source hardware.
  • Learn about how the OS interacts with the low-level hardware.
  • Learn how to deal with complex design and environment.
Project Supervisors

References

[1] Ara: https://arxiv.org/pdf/1906.00478.pdf

[2] Ara source code: https://github.com/pulp-platform/ara

[3] Cray-Processor: http://www.edwardbosworth.com/My5155_Slides/Chapter13/Cray_Supercomputers.htm

[4] RVV: https://github.com/riscv/riscv-v-spec/releases/tag/v1.0