Personal tools

Difference between revisions of "Cycle-Accurate Event-Based Simulation of Snitch Core"

From iis-projects

Jump to: navigation, search
(Status: Available)
Line 95: Line 95:
 
===Status: Available ===
 
===Status: Available ===
 
: Looking for 1-2 Semester/Master students (MA preferred)
 
: Looking for 1-2 Semester/Master students (MA preferred)
: Contact: [[:User:Paulin | Gianna Paulin]]
+
: Contact: [[:User:Paulin | Gianna Paulin]] [[:User:Jungvi| Victor Jung]] [[:User:Prasadar| Arpan Suravi Prasad]]
: Email: [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]
+
: Email: [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch] [mailto:jungvi@iis.ee.ethz.ch jungvi@iis.ee.ethz.ch] [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]
  
 
===Prerequisites===
 
===Prerequisites===

Revision as of 10:14, 17 July 2023

Overview of the GVSOC [1] pipeline.

Introduction

Simulating an entire SoCs, especially heterogeneous SoCs which combine general-purpose cores with application-specific accelerators, is rather complex and slow. Overall, this heterogeneity level requires complex hardware and a full-fledged software stack to evaluate applications while exploiting all platform features. For this reason, enabling agile design space exploration and rapid performance evaluation has become a crucial asset for exploring SoC architectures. In this scenario, high-level simulators play an essential role in breaking the speed and design effort bottlenecks of cycle-accurate simulators and FPGA prototypes, respectively, while preserving functional and timing accuracy.

For this reason, within the PULP project, we have developed GVSoC [1], a highly configurable and timing-accurate event-driven simulator that combines the efficiency of C++ models with the flexibility of Python configuration scripts. An overview is shown in the Figure. GVSoC combines the following concepts:

  • Python-based modular configuration of the hardware description
  • A set of event-driven, fast C++ models of various SoC components (cores, interconnects, I/O, ...)
  • Easy calibration of platform parameters for accurate performance estimation
  • High-speed simulation


Experimental results show that GVSoC enables practical functional and performance analysis and design exploration at the full-platform level (processors, memory, peripherals and IOs) with a speed-up of 2'500x with respect to cycle-accurate simulation with errors typically below 10% for performance analysis [1].

GVSoC already supports most of the traditional PULP Platform IPs used for our low-power microcontrollers. In the last couple of years, we have started to use a new lightweight 32-bit latency-tolerant RISC-V core called Snitch [2]. The Snitch core is a single-stage, single-issue, in-order design. Integer instructions with all of their operands available (no data dependencies present) can be fetched, decoded, executed, and written back in the same cycle to keep the area footprint as small as possible. The core keeps track of all 31 registers using a single bit in a scoreboard. There are three classes of instructions that need special handling:

Integer Instructions: Most of the instructions contained in the RISC-V I subset, such as integer arithmetic instructions, manipulation of control and status register (CSRs), and control flow changes, can be executed in a single-cycle as soon as all operands are available. Integer multiply/divide instructions are part of the M subset and are offloaded to the shared multiply/divide unit. There is no source of stalling as the arithmetic logic unit (ALU) is fully combinational and executes its instruction in a single cycle. To foster the re-use of the ALU, it also performs comparison for branches, calculates CSR masks, and performs address calculations for load/store instructions.

Load/Store Instructions: Load/store instructions execute as soon as all operands are available, and the memory subsystem can process a new request. The data port of the core can exert back-pressure onto the load/store subsystem. Furthermore, the load store unit (LSU) needs to keep track of issued load instructions and perform re-alignment and possible sign-extension. The core can have a configurable number of outstanding load instructions to the non-blocking memory hierarchy. Store instructions are considered fire-and-forget from a core perspective. The memory subsystem needs to maintain issue order as the core expects the arrival of load values in-order. In addition to regular load and stores, the LSU can also issue atomic memory operations and load-reserved/store- conditional (LR/SC) as defined by the RISC-V atomic memory operation specification. From a core perspective, the only difference is that the core also sends an atomic operation to the memory subsystem alongside the address and data. We provide additional signaling to accomplish that.

Accelerator/Special Function Unit Instructions (e.g. FPU): Off-loaded instruction can execute as soon as all operands are available, and the accelerator interface can accept a new offloading request.

Snitch is typically used together with an FPU, whose instructions are implemented as accelerator instructions. The FPU typically features SIMD, Minifloat (8-bit, 16-bit), and fused sum-dot-product capabilities. Thanks to ISA extensions for direct and indirect memory streaming and load/store elision (SSR/ISSR) [3,4], coupled with floating-point instruction repetition (FREP), Snitch can keep the FPU's utilization above 90% for ultra-efficient computation of data-parallel floating-point workloads at finely-tunable precision. So far, the snitch core was mainly used for scaled-up high core-count SoC designs, where we used Banshee [5] to emulate the cores. Banshee is, in contrast to GVSoC, instruction-accurate and not cycle-accurate. While it can be tuned to give some cycle count estimate, it is nowhere as accurate as the typical GVSoC models and does not include models of, e.g., peripherals or interconnects. Therefore, in this project, we want to add a cycle-accurate C++ model of Snitch to the GVSoC environment to enable rapid design explorations and SW prototyping with performance evaluations for SoC designs based on the Snitch core.

Project Goals

In this project, you implement and verify a Snitch GVSoC model, including cycle-count calibrations.

The main tasks of this project are:

  • T1: Familiarization (2-3 Weeks): In this task, the student will familiarize himself with the Snitch core architecture and the GVSoC setup.
  • T2: Snitch model implementation (4-5 Weeks): In this task, the student will implement the C++ model of Snitch, including the non-blocking scoreboard implementation.
  • T3: SSR \& FREP ISA extensions modeling (4-5 Weeks): In this task, the student will implement extend Snitch with the SSR and FREP extensions and include FPU-related instructions.
  • T4: Snitch Cluster model implementation (3-4 Weeks): In this task, the student will implement the cluster configuration with Snitch cores. This might require adaptations of the existing models for the I-Cache, HW barriers (no event unit), and cluster setup with a shared INT MUL/DIV unit.
  • T5: Calibration of the Snitch C++ model (3-4 Weeks): In this task, the student will evaluate and compare a few benchmarks on RTL against the GVSoC implementation to calibrate the GVSoC model.
  • T6: Report writing and Presentation (3 Weeks): In this task, the students will document their project, write a report and prepare a presentation.

Deliverables

  • D1: GVSoC Snitch C++ model.
  • D2: GVSoC Snitch cluster configuration setup.
  • D3: GVSoC vs. RTL simulation evaluations.

Practical Details

Within the first month of the project, you will be asked to prepare a project plan. This plan should identify the tasks to be performed during the project and set deadlines for those tasks. The prepared plan will be a topic of discussion of the first week's meeting between you and your advisers. Note that the project plan should be updated constantly depending on the project's status.

Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the students and the assistants schedule. These meetings will be used to evaluate the status and progress of the project. Besides these regular meetings, additional meetings can be organized to address urgent issues as well.

Software Code Style We generally suggest that you use style guides or code formatters provided by the language's developers or community. For example, we recommend LLVM's or Google's code styles with \verb|clang-format| for C/C++, PEP-8 and \verb|pylint| for Python, and the official style guide with \verb|rustfmt| for Rust.

Version Control Even in the context of a student project, keeping a precise history of changes is essential to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use Git as a version control system at IIS. If you have no previous experience with Git, we strongly advise you to familiarize yourself with the basic Git workflow before you start your project.

Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English. Any form of word processing software is allowed for writing the reports, nevertheless the use of \LaTeX{} with Tgif\footnote{See: \url{http://bourbon.usc.edu:8001/tgif/index.html} and \url{http://www.dz.ee.ethz.ch/en/information/how-to/drawing-schematics.html}.} or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.

The final report has to be presented at the end of the project, and a digital copy needs to be handed in. Note that this task description is part of your report and has to be attached to your final report.

There will be a presentation (15min (20min for MA) presentation and 5min Q&A) at the end of this project to present your results to a wider audience. The exact date will be determined towards the end of the work.

References

  • [1] Bruschi, Nazareno, et al. "GVSoC: a highly configurable, fast and accurate full-platform simulator for RISC-V based IoT processors." 2021 IEEE 39th International Conference on Computer Design (ICCD). IEEE, 2021.
  • [2] Zaruba, Florian, et al. "Snitch: A tiny pseudo-dual-issue processor for area and energy efficient execution of floating-point intensive workloads." IEEE Transactions on Computers 70.11 (2020): 1845-1860.
  • [3] Schuiki, Fabian, et al. "Stream semantic registers: A lightweight RISC-V ISA extension achieving full compute utilization in single-issue cores." IEEE Transactions on Computers 70.2 (2020): 212-227.
  • [4] Scheffler, Paul, et al. "Indirection stream semantic register architecture for efficient sparse-dense linear algebra." 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2021.
  • [5] Riedel, Samuel, et al. "Banshee: A Fast LLVM-Based RISC-V Binary Translator." 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2021.

Status: Available

Looking for 1-2 Semester/Master students (MA preferred)
Contact: Gianna Paulin Victor Jung Arpan Suravi Prasad
Email: pauling@iis.ee.ethz.ch jungvi@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch

Prerequisites

C++/Python skills
VLSI I (RTL simulation)
Computer Architecture / SoC for Data Analytics and Machine Learning (recommended)

Character

20% Theory
50% C++/Pyhton implementation
30% Characterization against RTL simulations

Professor

Luca Benini

↑ top