Fast Simulation of Manycore Systems (1S)
- 1 Overview
- 2 Introduction
- 3 Project Description
- 4 Milestones
- 5 Project Realization
- 6 Deliverables
- 7 References
- Type: Semester Thesis
- Professor: Prof. Dr. L. Benini
- 20% Software
- 60% RTL design
- 20% Evaluation
- VLSI I
- Experience with C
In a quest for high-performance computing systems, few architectural models retain the flexibility of manycore systems. Those systems integrate many small cores (hundreds, thousands) that work independently to execute highly-parallelizable algorithms.
At ETH, we are developing our own many-core system called MemPool [[[#ref-Cavalcante2020|1]]], [[[#ref-Riedel2021|2]]]. It boasts 256 lightweight 32-bit Snitch cores [[[#ref-Zaruba2020|3]]]. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA [[[#ref-Waterman2019|4]]]. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads and easy to program.
To benchmark MemPool, we heavily rely on cycle-accurate RTL simulation. However, simulation of such a big system is slow, even on the latest commercial simulators. This limits the complexity of architectures and benchmarks that can be explored. Furthermore, the problem is only becoming worse when scaling MemPool even further to thousands of cores.
This thesis’ goal is improve simulation performance of large manycore systems using MemPool as a test device. Specifically, it aims to optimize the performance of Verilator [[[#ref-Snyder2021|5]]], a open-source, cycle-accurate RTL simulator. It promises great speedups over commercial simulators, by transforming the HDL implementation into a C++ model, which is compiled for the host platform. However, transforming HDL code into a C++ model is very tricky and an efficient model requires that the HDL code is written in an efficient manner. This thesis will profile MemPool’s Verilator model to identify critical modules and improve their performance by exploring favorable code transformations.
The project has different aspects to be explored. First, Verilator offers a wide range of flags which can be used to tune the model. To familiarize yourself with the project, a first step can be exploring the impact of flags which impact performance.
In a second phase, you will profile the MemPool model to identify the IPs and code-blocks responsible for most of the simulation runtime. MemPool acts as a test device in this thesis, but the goal in this phase is to also optimize IPs that are used in virtually every hardware design in our group. Optimizing those can benefit all PULP-related designs. Verilator gives handy hints which code constructs it has trouble optimizing properly, and these hints can be used to get started. The goal of this phase is not only to obtain fast code, but also to explore some guidelines on how to write efficient code.
Verilator also offers a mutli-threaded simulation to improve performance even further. However, MemPool does not take advantage of this feature as of yet. Parallelizing the Verilator model and exploring the trade-offs that come with it is a further interesting aspect of this project.
Depending on the interest of the student and the time remaining, it is also possible to look into accelerating RTL simulation using Banshee, an instruction-accurate, binary translation based simulator for MemPool. It could be used to warm up the RLT simulation by very quickly generating the architectural state at the beginning of an interesting section and using it to fast-forward RTL simulation to this point.
The following are the milestones that we expect to achieve throughout the project:
- Familiarize yourself with MemPool and the Verilator model
- Explore Verilator’s options.
- Identify critical IPs and optimize them with the hints Verilator gives.
- Investigate Verilator’s multi-threaded mode.
Should the above milestones be reached earlier than expected and you are motivated to do further work, we propose the following stretch goals to aim for:
- Use Banshee to fast-forward RTL simulation to interesting regions.
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Beside these regular meetings, additional meetings can be organized to address urgent issues as well.
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.
HDL Code Style
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately feel at home with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.
Software Code Style
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with
clang-format for C/C++, PEP-8 and
pylint for Python, and the official style guide with
rustfmt for Rust.
Even in the context of a student project, keeping a precise history of changes is essential to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use Git as a version control system at IIS. If you have no previous experience with Git, we strongly advise you to familiarize yourself with the basic Git workflow before you start your project.
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.
There will be a presentation 15 min presentation and 5 min Q&A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:
- Final report incl. presentation slides
- Source code and documentation for all developed software and hardware
- Testsuites (software) and testbenches (hardware)
- Synthesis and implementation scripts, results, and reports
 M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, “MemPool: A shared-L1 memory many-core cluster with a low-latency interconnect,” in 2021 design, automation, and test in europe conference and exhibition (date), 2021, pp. 701–706.
 S. Riedel and M. Cavalcante, “MemPool GitHub.” 2021.
 F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini, “Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads,” IEEE TRANSACTIONS ON COMPUTERS, pp. 1–1, Feb. 2020.
 A. Waterman and K. Asanović, “The RISC-V Instruction Set Manual Volume I: Unprivileged ISA - Document Version 20191213,” RISC-V Foundation, 2019.
 W. Snyder, “Verilator.” 2021.