PULPonFPGA: Lightweight Virtual Memory Support - Page Table Walker
While high-end heterogeneous systems-on-chip (SoCs) are increasingly supporting heterogeneous uniform memory access (hUMA), their low-power counterparts targeting the embedded domain still lack basic features like virtual memory support for accelerators. As opposed to simply passing virtual address pointers, explicit data management involving copies is needed to share data between host processor and accelerators which hampers programmability and performance.
At IIS, we study the integration of programmable many-core accelerators into embedded heterogeneous SoCs. We have developed a mixed hardware/software solution to enable lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs [1,2]. Our solution is based on the Remapping Address Block (RAB): A hardware input/output translation lookaside buffer (IOTLB) efficiently managed by a kernel-level driver module running on the host CPU. In case of an IOTLB miss, the hardware sends an interrupt to the host CPU, which causes a miss-handling routine in the driver to be executed. As this routine relies on standard Linux kernel application programming interfaces (APIs), it is easily portable to other host CPU architectures, but it cannot be executed in interrupt context which causes substantial scheduling delays and an overall high IOTLB miss penalty.
The goal of this project is to accelerate the miss-handling routine by implementing a custom page table walker for the ARMv7/ARMv8 architecture used by the host CPU on our evaluation platform . In a first step, the page table walker will be implemented in software as part of the kernel-level driver module. After verifying and profiling the routine with real, heterogeneous applications on the evaluation platform, the routine shall either be ported to a dedicated microcontroller core inside the RAB or be implemented in dedicated hardware. While the first step allows to remove the scheduling delay due to the kernel APIs, the second step allows to remove the interrupt latency of the host CPU.
- 10% Theory, Algorithms and Simulation
- 40% C programming, Linux kernel hacking
- 20% VHDL/System Verilog, FPGA Design
- 30% Verification
- VLSI I
- VHDL/System Verilog, C
- Embedded Linux experience
- Experience with Linux kernel-level driver development is of advantage, but not strictly required.
- P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs", Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'15), Amsterdam, The Netherlands, 2015. link
- P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs", IEEE Transactions on Parallel and Distributed Systems, vol. PP, no. 99, 2016. link
- P. Vogel, "PULPonFPGA Howto", DZ EDA Wiki entry link
- U. Drepper, "Memory Part 3: Virtual Memory", LWN article link
- S. Eranian and D. Mosberger, "Virtual Memory in the IA-64 Linux Kernel", excerpt from IA-64 Linux Kernel: Design and Implementation link