Spatz grows wings: Physical Implementation of a Vector-Powered Manycore System (2S)
- 1 Overview
- 2 Introduction
- 3 Project description
- 4 Project Realization
- 5 Required skills
- Type: Semester Thesis
- Professor: Prof. Dr. L. Benini
Striving for high image quality, even on mobile devices, has led to an increase in pixel count in smartphone cameras over the last decade. These image sensors, boasting tens of millions of pixels, create a massive amount of data to be processed on a tight power envelope as quickly as possible. While this processing is highly parallelizable, it requires specialized ISPs which can exploit this high degree of parallelism to meet the timing and power constraints. One modern example of such an ISP is Google’s Pixel Visual Core, which contains eight image processing units, each consisting of 256 specialized processing elements to achieve a combined performance of 3.28 TOPS.
At ETH, we developed our own many-core system called MemPool. It boasts 256 lightweight 32-bit Snitch cores. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads.
In the quest to improve MemPool's peak performance, last semester, we proposed Spatz, an embedded vector processing unit based on the integer subset of the RISC-V Vector Extension. Spatz heavily improved MemPool's performance and energy efficiency. For an area increase of 26%, we increased MemPool's performance by 70% and its energy efficiency by 116%. This shows the relevance of small embedded vector processing units as the PE of large-scale clusters with tightly-coupled L1 memory.
The goal of this project is to explore how well Spatz performs on MemPool's small-scale configuration, MinPool. We want to tape-out a Spatz-powered MinPool configuration that allows us to characterize the power, performance, and efficiency of the standalone Spatz unit and of the MinPool cluster.
Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people. Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.
The whole project can be subdivided into four main phases, which are described in detail in the following:
Familiarizing with the design
MemPool has been the work of two advisors for this thesis for a few years. As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.
The student(s) should also familiarize themselves with Spatz, the work of Domenic Wüthrich which led to a conference publication---which is the main reference to the design.
Implementing MemPool's tile
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle. This module is the basis for building the MemPool cluster. It is, however, a ``normal module, which can be placed and routed as a rather dense macro.
We need to determine how fast the Spatz-powered MemPool's tiles can be clocked at the target technology node, also considering that they should be rather dense. The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile. This should also give the student a feel for the next step.
The student(s) might also try out a datapath-driven flow to better implement Spatz's latch-based VRF, which is a considerable part of the design.
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster. We do not know how many of such tiles we can instantiate before degrading timing or latency too much.
After this design exploration phase, the student and the advisors will decide on a final set of parameters for MinPool, finally deciding how large should it be.
Making MinPool standalone
During this phase, some RTL is needed to make MinPool a stand-alone design. The student will need to integrate an FLL, a boot ROM, and a JTAG to access and initialize the system. While there are many IPs and know-how at IIS for that, this is also highly dependent on the final design parameters for MinPool.
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks.
Run sanity tests on the place-and-routed design
Run DRC checks, simulate the place-and-routed design, and reason about the testing strategy.
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules. These meetings will be used to evaluate the status and progress of the project. Besides these regular meetings, additional meetings can be organized to address urgent issues as well.
The student is required to write a weekly report at the end of each week and to send it to his advisors by email. The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand. If signals, processes, and entities are always named the same way, any inconsistency can be detected easier. Moreover, if a design group shares the same naming convention, all members would immediately feel at home with each other's code. The naming conventions we follow in the PULP project are available here.
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff. If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded here.
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.
To work on this project, you will need:
- to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.
- to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.