Personal tools

Difference between revisions of "Virtual Memory Ara"

From iis-projects

Jump to: navigation, search
 
Line 2: Line 2:
 
[[Category:Acceleration and Transprecision]]
 
[[Category:Acceleration and Transprecision]]
 
[[Category:Computer Architecture]]
 
[[Category:Computer Architecture]]
[[Category:2022]]
+
[[Category:2023]]
[[Category:Bachelor Thesis]]
+
[[Category:Master Thesis]]
[[Category:Semester Thesis]]
 
 
[[Category:Mperotti]]
 
[[Category:Mperotti]]
 
[[Category:Available]]
 
[[Category:Available]]
Line 16: Line 15:
 
What a time for a project on a vector processor! RISC-V has almost finished ratifying its open-source vector ISA RVV (a process that lasted many years!), and many industries/universities are producing their first RVV-compatible cores.  
 
What a time for a project on a vector processor! RISC-V has almost finished ratifying its open-source vector ISA RVV (a process that lasted many years!), and many industries/universities are producing their first RVV-compatible cores.  
  
ETH is at the forefront of this race with its agile in-order vector processor Ara, fresh from an update from the unripe specifications RVV 0.5. Still, the overall Ara infrastructure runs only in bare-metal mode and is not designed to support an Operating System. This is a shame, since the scalar RV64GC core CVA6 does support it!
+
ETH is at the forefront of this race with its agile vector processor Ara, fresh from an update to the last specifications RVV 1.0. Ara behaves like a vector accelerator coupled with CVA6, one of the most mature open-source RV64GC cores and now maintained by OpenHW Group.  
 +
Still, the overall Ara infrastructure runs only in bare-metal mode and is not designed to support an Operating System. This is a shame since the scalar RV64GC core CVA6 does support it!
  
Running an OS is not straightforward and hides many pitfalls, but it allows for easy porting of many external programs and drastically increases the system usability.
+
Running an OS is not straightforward and hides many pitfalls, but it allows for easy porting of many external programs and drastically increases the system's usability.
  
 
== Project: Add Virtual Memory support to Ara ==
 
== Project: Add Virtual Memory support to Ara ==
  
The first goal of the project is to make Ara support Virtual Memory. This is a key step towards running an OS on the system.  
+
The first goal of the project is to make Ara support Virtual Memory. This is a key step toward running an OS on the system.  
  
 
Currently, both CVA6 and Ara have their private load-store units (LSU), but only CVA6 has a Memory Management Unit (MMU).  
 
Currently, both CVA6 and Ara have their private load-store units (LSU), but only CVA6 has a Memory Management Unit (MMU).  
 
This unit contains various modules (Translation Lookaside Buffer (TLB), Page Table Walker (PTW), Miss Holding Status Registers (MHSR), etc.) and allows for virtual-to-physical address translation. As it seems, it’s a very complex block to design and handle.  
 
This unit contains various modules (Translation Lookaside Buffer (TLB), Page Table Walker (PTW), Miss Holding Status Registers (MHSR), etc.) and allows for virtual-to-physical address translation. As it seems, it’s a very complex block to design and handle.  
  
One idea for the project is to share this unit with Ara, so that both the processors can use the same already-existing MMU. This would limit the difficulty of the task and should not harm performance too much.
+
One idea for the project is to share this unit with Ara so that both processors can use the same already-existing MMU. This would limit the difficulty of the task and should not harm performance too much for common-case memory operations.
  
Throughout the project, you will extend Ara’s LSU, helped by supervisors from ETH, and verify your design, with the aid of a company.
+
In parallel, supporting an operating system requires proper simulation capabilities. A second key step in the project will be porting the entire Ara system on FPGA and simulating it with a Linux OS without using the vector co-processor.  
  
 +
Once this task is done, we will finally be able to try our system with Linux AND the vector core enabled, benchmark, and optimize it.
  
An exciting extended goal of the project is to set up a scheduling strategy for the OS scheduler, so that the vector processes (or threads) can co-exist without excessive performance degradation.
+
Exciting extended goals of the project can be setting up a scheduling strategy for the OS scheduler so that the vector processes (or threads) can co-exist without excessive performance degradation, optimizing the context switch time, and the MMU/TLB accesses to boost the overall performance in real-case scenarios.
  
 
Another bonus goal can be to start studying how different kinds of vector memory operations behave when virtual memory is brought into the equation.
 
Another bonus goal can be to start studying how different kinds of vector memory operations behave when virtual memory is brought into the equation.
Line 39: Line 40:
  
 
* Familiarize yourself with Ara and with how CVA6’s MMU works.
 
* Familiarize yourself with Ara and with how CVA6’s MMU works.
* Modify the RTL to share the MMU between Ara and CVA6, taking care about the needed synchronization/arbitration between the two.
+
* Modify the RTL to share the MMU between Ara and CVA6, taking care of the needed synchronization/arbitration between the two.
 
* Implement the system with a pre-existing back-end flow to see if there is frequency degradation wrt the original system.
 
* Implement the system with a pre-existing back-end flow to see if there is frequency degradation wrt the original system.
* Prepare a strategy to verify the system.
+
* Port the system to FPGA.
* Verify the implementation, and characterize the IPC loss.
+
* Verify the system (if possible, by using OpenHW Group facilities).
 +
* Verify the implementation.
 +
* Benchmark the implementation.
 
* Write a report and prepare a presentation.
 
* Write a report and prepare a presentation.
 
* Possible BONUS goals.
 
* Possible BONUS goals.
Line 48: Line 51:
 
===== Requirements =====  
 
===== Requirements =====  
  
* Strong interest and basic knowledge in computer architecture and operating systems, both on the HW and SW sides
+
* Strong interest and basic knowledge in computer architecture and operating systems, both on the HW and SW sides.
* Experience with SystemVerilog HDL, such as taught in VLSI I
+
* Experience with SystemVerilog HDL, such as taught in VLSI I.
* C programming language
+
* Basic knowledge of FPGA tools.
* Bonus: Knowledge of ASIC tool flow (Synthesis + PnR), or parallel enrollment with VLSI II
+
* Basic knowledge of Operating Systems.
* Bonus: being familiar with vector processors, RISC-V RVV
+
* C programming language.
 +
* Bonus: Knowledge of ASIC tool flow (Synthesis + PnR), or parallel enrollment with VLSI II.
 +
* Bonus: being familiar with vector processors, RISC-V RVV.
  
 
Composition: 20% Study, 30% RTL implementation, 10% verification strategy, 20% verification, 20% evaluation
 
Composition: 20% Study, 30% RTL implementation, 10% verification strategy, 20% verification, 20% evaluation
Line 61: Line 66:
  
 
* Understand how a Vector architecture works.
 
* Understand how a Vector architecture works.
* Use git and cooperate on a complex project with companies.
+
* Work with an FPGA flow on a complex project.
 +
* Work with an OS on open-source hardware.
 
* Learn about how the OS interacts with the low-level hardware.
 
* Learn about how the OS interacts with the low-level hardware.
* Learn how to deal with a complex design and environment.
+
* Learn how to deal with complex design and environment.
  
 
===== Project Supervisors =====  
 
===== Project Supervisors =====  
 
* [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch]
 
* [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch]
 +
* [[:User:Nwistoff | Nils Wistoff]]: [mailto:nwistoff@iis.ee.ethz.ch]
  
 
== References ==
 
== References ==

Revision as of 09:41, 22 August 2023


Introduction

Vector processing is becoming a widespread option when dealing with highly parallel data workloads, thanks to its intrinsic computational capabilities and flexibility inherited from the Cray-1 processor. For example, “FUGAKU”, the most performant supercomputer in the world, is a vector processor!

A vector core can sustain high computational throughput using deep pipelines and multiple parallel units and, unlike standard SIMD architectures, can adjust the vector length at runtime without the need for new ISA instructions for different specific vector lengths.

What a time for a project on a vector processor! RISC-V has almost finished ratifying its open-source vector ISA RVV (a process that lasted many years!), and many industries/universities are producing their first RVV-compatible cores.

ETH is at the forefront of this race with its agile vector processor Ara, fresh from an update to the last specifications RVV 1.0. Ara behaves like a vector accelerator coupled with CVA6, one of the most mature open-source RV64GC cores and now maintained by OpenHW Group. Still, the overall Ara infrastructure runs only in bare-metal mode and is not designed to support an Operating System. This is a shame since the scalar RV64GC core CVA6 does support it!

Running an OS is not straightforward and hides many pitfalls, but it allows for easy porting of many external programs and drastically increases the system's usability.

Project: Add Virtual Memory support to Ara

The first goal of the project is to make Ara support Virtual Memory. This is a key step toward running an OS on the system.

Currently, both CVA6 and Ara have their private load-store units (LSU), but only CVA6 has a Memory Management Unit (MMU). This unit contains various modules (Translation Lookaside Buffer (TLB), Page Table Walker (PTW), Miss Holding Status Registers (MHSR), etc.) and allows for virtual-to-physical address translation. As it seems, it’s a very complex block to design and handle.

One idea for the project is to share this unit with Ara so that both processors can use the same already-existing MMU. This would limit the difficulty of the task and should not harm performance too much for common-case memory operations.

In parallel, supporting an operating system requires proper simulation capabilities. A second key step in the project will be porting the entire Ara system on FPGA and simulating it with a Linux OS without using the vector co-processor.

Once this task is done, we will finally be able to try our system with Linux AND the vector core enabled, benchmark, and optimize it.

Exciting extended goals of the project can be setting up a scheduling strategy for the OS scheduler so that the vector processes (or threads) can co-exist without excessive performance degradation, optimizing the context switch time, and the MMU/TLB accesses to boost the overall performance in real-case scenarios.

Another bonus goal can be to start studying how different kinds of vector memory operations behave when virtual memory is brought into the equation.

Tasks
  • Familiarize yourself with Ara and with how CVA6’s MMU works.
  • Modify the RTL to share the MMU between Ara and CVA6, taking care of the needed synchronization/arbitration between the two.
  • Implement the system with a pre-existing back-end flow to see if there is frequency degradation wrt the original system.
  • Port the system to FPGA.
  • Verify the system (if possible, by using OpenHW Group facilities).
  • Verify the implementation.
  • Benchmark the implementation.
  • Write a report and prepare a presentation.
  • Possible BONUS goals.
Requirements
  • Strong interest and basic knowledge in computer architecture and operating systems, both on the HW and SW sides.
  • Experience with SystemVerilog HDL, such as taught in VLSI I.
  • Basic knowledge of FPGA tools.
  • Basic knowledge of Operating Systems.
  • C programming language.
  • Bonus: Knowledge of ASIC tool flow (Synthesis + PnR), or parallel enrollment with VLSI II.
  • Bonus: being familiar with vector processors, RISC-V RVV.

Composition: 20% Study, 30% RTL implementation, 10% verification strategy, 20% verification, 20% evaluation

What will you learn

During the project, you will develop several skills.

  • Understand how a Vector architecture works.
  • Work with an FPGA flow on a complex project.
  • Work with an OS on open-source hardware.
  • Learn about how the OS interacts with the low-level hardware.
  • Learn how to deal with complex design and environment.
Project Supervisors

References

[1] Ara: https://arxiv.org/pdf/1906.00478.pdf

[2] Ara source code: https://github.com/pulp-platform/ara

[3] Cray-Processor: http://www.edwardbosworth.com/My5155_Slides/Chapter13/Cray_Supercomputers.htm

[4] RVV: https://github.com/riscv/riscv-v-spec/releases/tag/v1.0