Personal tools

Difference between revisions of "GDBTrace: A Post-Simulation Trace-Based RISC-V GDB Debugging Server (1S)"

From iis-projects

Jump to: navigation, search
Line 1: Line 1:
<!-- A Post-Simulation Trace-Based RISC-V GDB Debugging Server (1S) -->
+
<!-- GDBTrace: A Post-Simulation Trace-Based RISC-V GDB Debugging Server (1S) -->
  
 
[[Category:Digital]]
 
[[Category:Digital]]
Line 34: Line 34:
 
If your processor has a debugging interface (exposed e.g. over JTAG), you can actually connect a debugger to the simulation and communicate with the processor over the JTAG interface port [1]. This method is extremely powerful, as the debugger provides us with the desired fully-featured source-centric debugging experience. This method is also extremely portable, any RISC-V core with a JTAG debugging interface can reuse the same infrastructure. However, simulating the serial connection on the JTAG interface can significantly slow down the simulation to the point where it becomes unpractical to use. This is especially true when dealing with large and complex designs.
 
If your processor has a debugging interface (exposed e.g. over JTAG), you can actually connect a debugger to the simulation and communicate with the processor over the JTAG interface port [1]. This method is extremely powerful, as the debugger provides us with the desired fully-featured source-centric debugging experience. This method is also extremely portable, any RISC-V core with a JTAG debugging interface can reuse the same infrastructure. However, simulating the serial connection on the JTAG interface can significantly slow down the simulation to the point where it becomes unpractical to use. This is especially true when dealing with large and complex designs.
  
More  
+
More recently, as of February 2022, the GDBWave tool [2,3] was released, providing a solution which does not incur the significant slowdown from simulating the serial debugging connection. The underlying idea is to log, during the simulation, all the information which is needed to reconstruct the program execution after the simulation has completed. Specifically, GDBWave takes a waveform dump from the simulation as input. So long as it can identify and interpret the relevant signals, such as the program counter and the write port to the register file, it can replay the execution of the program as requested by the debugger. Unfortunately, as these signals are implementation dependent, this approach suffers of poor portability.
  
 
<!--
 
<!--
Line 40: Line 40:
 
= Project description =
 
= Project description =
  
In this project, you will port a series of HPC kernels from the PolyBench/C benchmark [10] to Occamy. You will optimize the kernels to take advantage of the heterogeneous architecture and software defined data movement. An additional goal would be to explore the applicability of SSRs and FREP to the kernels, and potentially other extensions under development. The applications will be developed in C. A bare-metal runtime is provided, hiding the details of the hardware beneath a set of convenience functions. The programs will be run in RTL simulation. To speed up the development, we might opt for a downscaled version of Occamy, with a reduced number of accelerator cores.
+
In this project, you will develop a GDB server, similar in its essence to GDBWave, which can process a RISC-V program trace and interact with a GDB client providing a fully-featured debugging experience. In comparison to GDBWave, this approach is implementation-agnostic and can virtually support any RISC-V core, as long as it complies with our trace format. In addition, by removing the need to log any waveforms, the simulator might be able to optimize the design more aggressively, and can possibly run faster.
  
== Detailed task description ==
+
You will follow the same approach of GDBWave, which is very well explained in a blog post from the author [2]. In particular, you will be interfacing to the GDB client via GDB's remote serial protocol (RSP) [4]. You are not expected to implement the whole RSP interface. Instead, the minimal functionality we would like to support is the same as described in the GDBWave blog post and that mainly includes:
  
To break it down in more detail, you will:
+
* Single-stepping
 
+
* Breakpoints
* '''Gain a deep understanding of the PolyBench kernels''', in particular of:
 
** the underlying algorithms;
 
** the data movement and communication patterns;
 
** the parallelism they expose, i.e. distinguish sequential vs. parallel code regions, data vs. task parallelism, etc.;
 
* '''Understand the Occamy architecture and familiarize with the software stack
 
* '''Select a suitable subset of kernels to implement'''
 
* '''Implement the kernels on Occamy'''
 
** '''A)''' Port the original sources to run on the CVA6 host
 
** '''B)''' Offload amenable code regions to the accelerator
 
** '''C)''' Optimize data movement, overlapping communication and computation where possible
 
* '''Compare the performance and energy efficiency of the implementations in A), B) and C)'''
 
** Analyze the speedup in Amdahl's terms
 
** Understand and locate where the major performance losses occur
 
** Compare the attained FPU utilization and performance to the architecture's peak values
 
** Suggest new hardware features or ISA extensions to further improve the runtime
 
  
 
== Optional stretch goals ==
 
== Optional stretch goals ==
Line 66: Line 51:
 
Additional stretch goals may include:
 
Additional stretch goals may include:
  
* Study which kernels could be optimized with the SSR or FREP ISA extensions
+
* Connecting to GDBTrace through an IDE such as Eclipse [5]
* Eventually optimize the kernels with SSRs or FREP
+
*
* Categorize the kernels based on their use of collective communication (multicast, reductions) and synchronization primitives (barriers, locks)
 
* Compare your results to a GPU or high-end CPU implementation [11]
 
  
 
== Character ==
 
== Character ==
  
* 20% Literature/architecture review
+
* 20% Literature and documentation review
* 60% Bare-metal C programming
+
* 20% Code review
* 20% Evaluation
+
* 60% C or Python programming
  
 
== Prerequisites ==
 
== Prerequisites ==
  
* Strong interest in computer architecture
+
* Interest in computer architecture
* Experience in bare-metal or embedded C programming
+
* Experience in programming in C/C++ or Python
* Experience with digital design in SystemVerilog as taught in VLSI I
+
* Preferred: Prior experience with debugging tools
 
* Preferred: Knowledge or prior experience with RISC-V
 
* Preferred: Knowledge or prior experience with RISC-V
* Preferred: Experience with ASIC implementation flow as taught in VLSI II
 
  
 
-->
 
-->
Line 90: Line 72:
  
 
[1] https://github.com/pulp-platform/pulp-debug-bridge <br />
 
[1] https://github.com/pulp-platform/pulp-debug-bridge <br />
 +
[2] https://tomverbeure.github.io/2022/02/20/GDBWave-Post-Simulation-RISCV-SW-Debugging.html <br />
 +
[3] https://github.com/tomverbeure/gdbwave <br />
 +
[4] https://www.embecosm.com/appnotes/ean4/embecosm-howto-rsp-server-ean4-issue-2.html <br />
 +
[5] https://projects.eclipse.org/projects/tools.cdt

Revision as of 00:53, 13 October 2022


Overview

Status: Available

Introduction

In any chip design project, a lot of effort is spent on verifying the design before it can be delivered for tapeout. For general purpose processor designs, verifying the functionality involves running some code on the core(s) and comparing the outputs of the program with the expected results. On a mismatch, we will have to identify where the bug originates in the hardware and, for this to be possible, we need to identify at what point the program behaviour deviates from our expectations.

If your processor has a debugging interface (exposed e.g. over JTAG) and can be implemented on an FPGA, you will possibly be able to connect a debugger to your FPGA board and benefit of a fully-featured debugging experience, as you might be used to from programming embedded systems. However, this is not always the case, as bringing a design onto an FPGA requires additional effort, and for large designs which do not fit on a single FPGA it might not be possible to have an exact 1-to-1 correspondence between the FPGA implementation and the original design.

If testing your design on an FPGA is not an option, you will have to debug your code directly in RTL simulation. In this setting, you can forget the bells and whistles of integrated development environments (IDEs) with their powerful debugging features and convenient graphical interfaces. Oftentimes, your best tool at hand is the core's bare execution trace, i.e. a flat record of the instructions which were executed by the core during the simulation, accompanied by auxiliary information such as the values of the registers involved in the instruction at the time of its execution.

Is this really the best way we can inspect our program's execution? Alternative methodologies do exist, but come with their own shortcomings which we want to address in this thesis. We aim for a source-centric debugging experience, i.e. in which the program execution can be inspected at the source (or high-level language) level. We thus do not mention any approach which aims at making the trace-centric debugging experience more "confortable".

If your processor has a debugging interface (exposed e.g. over JTAG), you can actually connect a debugger to the simulation and communicate with the processor over the JTAG interface port [1]. This method is extremely powerful, as the debugger provides us with the desired fully-featured source-centric debugging experience. This method is also extremely portable, any RISC-V core with a JTAG debugging interface can reuse the same infrastructure. However, simulating the serial connection on the JTAG interface can significantly slow down the simulation to the point where it becomes unpractical to use. This is especially true when dealing with large and complex designs.

More recently, as of February 2022, the GDBWave tool [2,3] was released, providing a solution which does not incur the significant slowdown from simulating the serial debugging connection. The underlying idea is to log, during the simulation, all the information which is needed to reconstruct the program execution after the simulation has completed. Specifically, GDBWave takes a waveform dump from the simulation as input. So long as it can identify and interpret the relevant signals, such as the program counter and the write port to the register file, it can replay the execution of the program as requested by the debugger. Unfortunately, as these signals are implementation dependent, this approach suffers of poor portability.


References

[1] https://github.com/pulp-platform/pulp-debug-bridge
[2] https://tomverbeure.github.io/2022/02/20/GDBWave-Post-Simulation-RISCV-SW-Debugging.html
[3] https://github.com/tomverbeure/gdbwave
[4] https://www.embecosm.com/appnotes/ean4/embecosm-howto-rsp-server-ean4-issue-2.html
[5] https://projects.eclipse.org/projects/tools.cdt