HERO: TLB Invalidation
With the PULP platform , the IIS actively develops a manycore platform for ultra-low power operation and leading-edge energy-efficieny together with the EEES group of UNIBO. This co-operation has already led to a substantial number of tape outs of various single-cluster PULP configurations in multiple technology nodes.
Besides ultra-low power operation, the PULP project also aims at designing a platform that is highly scalable and features widely-tunable performance, e.g., to use use PULP as a high-performance parallel accelerator in heterogeneous systems. To this end, we also study the seamless integration of PULP-like, programmable many-core accelerators into embedded heterogeneous SoCs built around an ARM Cortex-A like multicore CPU (the host). Within the project, we have developed
- a multi-ISA compile toolchain automating the offloading of highly-parallel OpenMP function kernels from the host CPU to the accelerator , and
- a mixed hardware/software solution enabling lightweight, shared virtual memory (SVM) between host CPU and accelerator [3,4],
both dramatically simplifying the programmability of such a heterogeneous system.
Our evaluation platform combines a modern ARMv8 multicluster CPU with a Xilinx FPGA capable of implementing PULP with up to 8 clusters and a total of 64 cores. All cores in the PULP accelerator can access the memory of a host process through a dedicated translation lookaside buffer (TLB). These accesses to shared virtual memory are coherent with the caches of the host.
The entries of the TLB itself, however, are not coherent with the host memory. Thus, if a page is moved in physical memory while it has an entry in the TLB, that TLB entry becomes invalid and must no longer be used. (Preventing the movement of all pages that could be shared with the accelerator is prohibitively expensive.) This invalidation of a TLB entry is called TLB invalidation.
The goal of this project is to implement TLB invalidations for our heterogeneous system. The project can roughly be split into two parts: modifications to the Linux kernel running on the host CPU and changes to our TLB hardware. The exact task description can be tailored to match your background knowledge and needs before the project starts: Depending on your knowledge and interest, the focus can be more on the soft- or hardware side. Just make sure to contact us in advance!
- 20% Theory and Background
- 50% Design and Implementation (SystemVerilog, C, FPGA/ASIC Design)
- 30% Verification
- SystemVerilog, C
- VLSI 1
- Note: The details of the project need to be discussed in advance to set up a task description matching your skills and interests.
- PULP link
- A. Capotondi, A. Marongiu, "Enabling Zero-Copy OpenMP Offloading on the PULP Many-Core Accelerator", Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17), Sankt Goar, Germany, 2017 link
- P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs", IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7, 2017. link
- P. Vogel, A. Kurth, J. Weinbuch, A. Marongiu, L. Benini, "Efficient Virtual Memory Sharing via On-Accelerator Page Table Walking in Heterogeneous Embedded SoCs", ESWEEK special issue of ACM Transactions on Embedded Computing Systems link
- P. Vogel, "PULPonFPGA Howto", DZ EDA Wiki entry link
- Juno ARM Development Platform link
- Xilinx Virtex-7 XC7V2000T FPGA link