BigPULP: Shared Virtual Memory Multicluster Extensions
With the PULP platform, the IIS actively develops a manycore platform for ultra-low power operation and leading-edge energy-efficieny together with the EEES group of UNIBO. This co-operation has already led to a substantial number of tape outs of various single-cluster PULP configurations in multiple technology nodes . A key component allowing for high energy-efficiency is the Event Unit Flex, i.e., a highly versatile, programmable, cluster-internal module that enables the fast synchronization of cores, peripherals and DMA engines using barriers, mutexes, job dispatching, interrupts and events with low overhead only.
Besides ultra-low power operation, the PULP project also aims at designing a platform that is highly scalable and features widely-tunable performance, e.g., to use use PULP as a high-performance parallel accelerator in heterogeneous systems. To this end, we also study the seamless integration of PULP-like, programmable many-core accelerators into embedded heterogeneous SoCs built around an ARM Cortex-A like multicore CPU (the host). Within the project, we have developed
- a multi-ISA compile toolchain automating the offloading of highly-parallel OpenMP function kernels from the host CPU to the accelerator , and
- a mixed hardware/software solution enabling lightweight, shared virtual memory (SVM) between host CPU and accelerator [3,4],
both dramatically simplifying the programmability of such a heterogeneous system.
Recently, we have set up operation of our new bigPULP evaluation platform  based on the Juno ARM Development Platform . This system combines a modern ARMv8 multicluster CPU with a Xilinx Virtex-7 XC7V2000T FPGA  capable of implementing PULP  with 4 to 8 clusters and a total of 32 to 64 cores.
In the current implementation , one or multiple helper threads on the accelerator take care of managing the hardware infrastructure for SVM. The goal of this project is to extend the current infrastructure for optimized efficienty in a multicluster system. Ideally, every cluster would have its own, dynamically scheduled helper threads that manage a dynamically allocated portion of the shared SVM hardware.
While the overall platform is of considerable complexity, the exact task description can be tailored to match your background knowledge and needs before the project starts: Depending on your skills, the focus can be more on the soft- or hardware side. Just make sure to contact us in advance!
- 20% Theory, Algorithms and Simulation
- 50% Implementation (C, VHDL, FPGA/ASIC Design)
- 30% Verification
- VHDL/System Verilog, C
- VLSI 1 (if the focus shall be on the hardware side)
- Note: The details of the project need to be discussed in advance to set up a task description matching your skills and interests.
- PULP link
- A. Capotondi, A. Marongiu, "Enabling Zero-Copy OpenMP Offloading on the PULP Many-Core Accelerator", Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17), Sankt Goar, Germany, 2017 link
- P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs", IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7, 2017. link
- P. Vogel, A. Kurth, J. Weinbuch, A. Marongiu, L. Benini, "Efficient Virtual Memory Sharing via On-Accelerator Page Table Walking in Heterogeneous Embedded SoCs", ESWEEK special issue of ACM Transactions on Embedded Computing Systems link
- P. Vogel, "PULPonFPGA Howto", DZ EDA Wiki entry link
- Juno ARM Development Platform link
- Xilinx Virtex-7 XC7V2000T FPGA link