Fast Accelerator Context Switch for PULP
- Type: Master Thesis
- Professor: Prof. Dr. L. Benini
PULP (Parallel Ultra-Low-Power)  is an open-source multi-core computing platform. It consists of an advanced microcontroller architecture with a parallel computing cluster composed of 8 RISC-V, fully programmable 32-bit processing elements (PE) featuring DSP extensions targeting energy-efficient digital signal processing . This computing cluster serves as an accelerator.
ControlPULP is a specialized version of PULP that focuses on predictability. It is used in the European Processor Initiative (EPI)  playing the part of a Power Controller Subsystem (PCS) dynamically adjusting the operating point of a High Performance Computing (HPC) processor to meet energy, power, and thermal constraints. The PCF executes a multi-task control-law that involves optimal operating point computation and complex multi-input, multi-output interaction with the external world. In order to fulfill this task, the FreeRTOS real-time OS layer is employed to schedule the PCF tasks.
The parallel computing cluster is used as an accelerator to parallelize the Power Control Firmware (PCF)  running on the PCS architecture. In the current implementation, the fabric controller, i.e. the 32-bit manager core, can offload a task to the computing cluster by exploiting:
1. Synchronous (blocking) offload: the manager core has to wait (poll) until the computing cluster finishes the task. When this happens, the manager core regains control and can offload another computation; 2. Asynchronous (non-blocking) offload: the manager core can perform other tasks while the cluster is busy with the offloaded task. When it finishes, it notifies the fabric controller with a callback, interrupt-triggered.
Context switching is the process of saving the context, i.e. registers and other machine state, determining the next task to be scheduled, and finally restoring the context of the potentially new, different task . The PULP Cluster has been traditionally designed to accelerate Machine Learning Kernels for bare-metal applications, and optimized with energy-efficiency in mind. Nevertheless, it lacks optimization and flexibility for more real-time oriented scenarios.
The idea is to add context switching and virtualization capabilities to the cluster of ControlPULP allowing the operating system to schedule several accelerator tasks and let them run concurrently. This allows higher utilization of the cluster compute resources. For example accelerator tasks that wait for data from the DMA can block and let other accelerator tasks continue.
The goal of this project is to
- Implement context switching capabilities in software as a baseline.
- Identify opportunities to accelerate context switching and virtualization on the hardware level and implement these in ControlPULP.
- Evaluate the resulting system in RTL simulation and/or on the FPGA (ControlPULP has an FPGA implementation).
- 10% Literature / architecture review
- 50% RTL implementation
- 30% Bare-metal C programming
- 10% Evaluation
- Strong interest in computer architecture
- Experience with digital design in SystemVerilog as taught in VLSI I
- Experience with low-level programming in C
 https://github.com/pulp-platform/pulp (GitHub repository)
 https://iis-git.ee.ethz.ch/giovanni.bambini/epi_pmu_ethz (Gitlab repository)