PULPonFPGA: Lightweight Virtual Memory Support - Physically Contiguous Memory
While high-end heterogeneous systems-on-chip (SoCs) are increasingly supporting heterogeneous uniform memory access (hUMA), their low-power counterparts targeting the embedded domain still lack basic features like virtual memory support for accelerators. As opposed to simply passing virtual address pointers, explicit data management involving copies is needed to share data between host processor and accelerators which hampers programmability and performance.
At IIS, we study the integration of programmable many-core accelerators into embedded heterogeneous SoCs. We have developed a mixed hardware/software solution to enable lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs [1,2]. Our solution is based on the Remapping Address Block (RAB). This hardware block features two different types of input/output translation lookaside buffers (IOTLBs) efficiently managed by a kernel-level driver module running on the host CPU. The first IOTLB is implemented using a fully-associative content addressable memory (CAM), and allows for address remappings of arbitrary size, independent of the page size of the Linux operating system running on the host CPU. In a student project , a second, set-associative IOTLB has been designed which is much more scalable than the first one but which is limited to 4 KiB, page-sized remappings.
To reduce the number of required TLB entries and the performance penalty due to TLB misses, different software frameworks [4,5,6] exist to let Linux use large memory pages as supported by today's CPU architectures. With the contiguous memory allocator (CMA), the Linux kernel has a built-in mechanism to to allocate large chunks of physically contiguous memory at boot time . A kernel-level driver may then request memory from this pre-allocated section and give access to it to user-space applications through, e.g., an mmap() system call. Ideally, all data shared with the accelerator is placed in this section, requiring a single entry in the first IOTLB only.
The goal of this project is evaluate the usage of Linux' CMA together with the RAB's first, flexible IOTLB on our evaluation platform . To this end, you will first extend the existing kernel-level driver module, the user-space runtime and the applications to let them use physically contiguous memory provided by the CMA. Next, you will verify and profile the implementation with real, heterogeneous applications on the evaluation platform. Finally, you will investigate and identify suitable compile- and/or runtime techniques to automate the usage of CMA in order to simplify the application programmer's job.
- Looking for 1 Interested Master Student (Semester Project)
- Supervision: Pirmin Vogel, Andrea Marongiu
- 10% Theory, Algorithms and Simulation
- 40% C programming, Linux kernel hacking
- 50% C programming, compiler and user-space runtime modifications
- Embedded Linux experience
- Experience with Linux kernel-level driver development and GNU Compiler Collection (GCC) hacking is of advantage, but not strictly required.
- P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs", Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'15), Amsterdam, The Netherlands, 2015. link
- P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs", IEEE Transactions on Parallel and Distributed Systems, vol. PP, no. 99, 2016. link
- PULPonFPGA: Lightweight Virtual Memory Support - Multi-Level TLB, Project Description link
- M. Gorman, "Huge pages", LWN article link
- "libhugetlbfs", Software library link
- J. Corbet, "Transparent huge pages", LWN article link
- M. Nazarewicz, "A deep dive into CMA", LWN article link
- P. Vogel, "PULPonFPGA Howto", DZ EDA Wiki entry link