Personal tools

Difference between revisions of "PULPonFPGA: Lightweight Virtual Memory Support - Page Table Walker"

From iis-projects

Jump to: navigation, search
Line 52: Line 52:
[[Category:Heterogeneous Acceleration Systems]]

Latest revision as of 16:57, 7 November 2017

Pulp on fpga.png


While high-end heterogeneous systems-on-chip (SoCs) are increasingly supporting heterogeneous uniform memory access (hUMA), their low-power counterparts targeting the embedded domain still lack basic features like virtual memory support for accelerators. As opposed to simply passing virtual address pointers, explicit data management involving copies is needed to share data between host processor and accelerators which hampers programmability and performance.

At IIS, we study the integration of programmable many-core accelerators into embedded heterogeneous SoCs. We have developed a mixed hardware/software solution to enable lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs [1,2]. Our solution is based on the Remapping Address Block (RAB): A hardware input/output translation lookaside buffer (IOTLB) efficiently managed by a kernel-level driver module running on the host CPU. In case of an IOTLB miss, the hardware sends an interrupt to the host CPU, which causes a miss-handling routine in the driver to be executed. As this routine relies on standard Linux kernel application programming interfaces (APIs), it is easily portable to other host CPU architectures, but it cannot be executed in interrupt context which causes substantial scheduling delays and an overall high IOTLB miss penalty.

Short Description

The goal of this project is to accelerate the miss-handling routine by implementing a custom page table walker for the ARMv7/ARMv8 architecture used by the host CPU on our evaluation platform [3]. In a first step, the page table walker will be implemented in software as part of the kernel-level driver module. After verifying and profiling the routine with real, heterogeneous applications on the evaluation platform, the routine shall either be ported to a dedicated microcontroller core inside the RAB or be implemented in dedicated hardware. While the first step allows to remove the scheduling delay due to the kernel APIs, the second step allows to remove the interrupt latency of the host CPU.

Status: Completed

Johannes Weinbuch
Supervision: Pirmin Vogel, Björn Forsberg, Andrea Marongiu


10% Theory, Algorithms and Simulation
40% C programming, Linux kernel hacking
20% VHDL/System Verilog, FPGA Design
30% Verification


VHDL/System Verilog, C
Embedded Linux experience
Experience with Linux kernel-level driver development is of advantage, but not strictly required.


Luca Benini


  1. P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs", Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'15), Amsterdam, The Netherlands, 2015. link
  2. P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs", IEEE Transactions on Parallel and Distributed Systems, vol. PP, no. 99, 2016. link
  3. P. Vogel, "PULPonFPGA Howto", DZ EDA Wiki entry link
  4. U. Drepper, "Memory Part 3: Virtual Memory", LWN article link
  5. S. Eranian and D. Mosberger, "Virtual Memory in the IA-64 Linux Kernel", excerpt from IA-64 Linux Kernel: Design and Implementation link

↑ top