Physical Implementation of Ara, PULP's Vector Machine (1-2S)
Status: In Progress
- Student: Jiantao Liu
- Type: Semester Thesis
- Professor: Prof. Dr. L. Benini
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB), related to the memory traffic required for fetching the instructions. In particular, multi-core designs, although highly flexible, do not explore the regularity of data-parallel applications. Each core tends to execute the same instruction many times, a waste in terms of both area and energy.
The quest for extreme energy efficiency in data-parallel execution revamped the interest in vector architectures. Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general-purpose processor for applications that fit its execution model (e.g., machine learning, and digital signal processing). Vector machines tackle the VNB through vector instructions, which encode a series of micro-operations within a single instruction. The renewed interest in vector processing is reflected by the introduction of vector instruction extensions in all popular Instruction Set Architectures, such as Arm's with its SVE, and RISC-V with the V Extension.
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on version 0.10 of the RISC-V Vector extension. The vector unit was designed for a memory bandwidth per peak performance ratio of 2B/DP-FLOP. Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core. The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating-point operands.
Ara is working well, with a prototype achieving an operating frequency of 1 GHz in a modern technology. Ara, however, was never taped-out. The objective of this project is to do so, first by building a simpler version of Ara, then by placing and routing the design.
The project consists of the following parts:
1. Familiarize with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)
2. Synthesize Ara (~2 person weeks)
- Determine the timing constraints of the design, and which functionalities should be supported. - Determine the area constraints of the design. - Simulate the synthesized design.
3. Make Ara a standalone design. (~3 person weeks)
- Integrate a boot ROM and an FLL. - Integrate a JTAG port and the initialization mechanism.
4. Place and route Ara (~2 person weeks)
5. Run sanity tests on the place-and-routed design. (~2 person weeks)
- Run DRC tests. - Simulate the place-and-routed design. - Reason about the testing mechanism.
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules. These meetings will be used to evaluate the status and progress of the project. Besides these regular meetings, additional meetings can be organized to address urgent issues as well.
The student is required to write a weekly report at the end of each week and to send it to his advisors by email. The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand. If signals, processes, and entities are always named the same way, any inconsistency can be detected easier. Moreover, if a design group shares the same naming convention, all members would immediately feel at home with each other's code. The naming conventions we follow in the PULP project are available here.
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff. If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded here.
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.
To work on this project, you will need:
- to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.
- to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.
- Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. link