Personal tools

Difference between revisions of "PULPonFPGA: Lightweight Virtual Memory Support - Physically Contiguous Memory"

From iis-projects

Jump to: navigation, search
(Short Description)
Line 8: Line 8:
  
 
==Short Description==
 
==Short Description==
The goal of this project is evaluate the usage of Linux' CMA together with the RAB's first, flexible IOTLB on our evaluation platform [8]. To this end, you will first extend the existing kernel-level driver module, the user-space runtime and the applications to let them use physically contiguous memory provided by the CMA. Next, you will verify and profile the implementation with real, heterogeneous applications on the evaluation platform. Finally, you will investigate and identify suitable compile- and/or runtime techniques to automate the usage of CMA in order to simplify the application programmer's job.  
+
The goal of this project is evaluate the usage of Linux' CMA together with the RAB's first, flexible IOTLB on our evaluation platform [9]. To this end, you will first extend the existing kernel-level driver module, the user-space runtime and the applications to let them use physically contiguous memory provided by the CMA. Next, you will verify and profile the implementation with real, heterogeneous applications on the evaluation platform. Finally, you will investigate and identify suitable compile- and/or runtime techniques to automate the usage of CMA in order to simplify the application programmer's job.  
  
 
===Status: Available ===
 
===Status: Available ===

Revision as of 09:22, 17 July 2017

Pulp on fpga.png

Intro

While high-end heterogeneous systems-on-chip (SoCs) are increasingly supporting heterogeneous uniform memory access (hUMA), their low-power counterparts targeting the embedded domain still lack basic features like virtual memory support for accelerators. As opposed to simply passing virtual address pointers, explicit data management involving copies is needed to share data between host processor and accelerators which hampers programmability and performance.

At IIS, we study the integration of programmable many-core accelerators into embedded heterogeneous SoCs. We have developed a mixed hardware/software solution to enable lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs [1,2,3]. Our solution is based on the Remapping Address Block (RAB). This hardware block features two different types of input/output translation lookaside buffers (IOTLBs) efficiently managed by a kernel-level driver module running on the host CPU [1,2] and a dedicated helper thread running on the accelerator [3]. The first IOTLB is implemented using a fully-associative content addressable memory (CAM), and allows for address remappings of arbitrary size, independent of the page size of the Linux operating system running on the host CPU. In a student project [4], a second, set-associative IOTLB has been designed which is much more scalable than the first one but which is limited to 4 KiB, page-sized remappings.

To reduce the number of required TLB entries and the performance penalty due to TLB misses, different software frameworks [5,6,7] exist to let Linux use large memory pages as supported by today's CPU architectures. With the contiguous memory allocator (CMA), the Linux kernel has a built-in mechanism to to allocate large chunks of physically contiguous memory at boot time [8]. A kernel-level driver may then request memory from this pre-allocated section and give access to it to user-space applications through, e.g., an mmap() system call. Ideally, all data shared with the accelerator is placed in this section, requiring a single entry in the first IOTLB only.

Short Description

The goal of this project is evaluate the usage of Linux' CMA together with the RAB's first, flexible IOTLB on our evaluation platform [9]. To this end, you will first extend the existing kernel-level driver module, the user-space runtime and the applications to let them use physically contiguous memory provided by the CMA. Next, you will verify and profile the implementation with real, heterogeneous applications on the evaluation platform. Finally, you will investigate and identify suitable compile- and/or runtime techniques to automate the usage of CMA in order to simplify the application programmer's job.

Status: Available

Looking for 1 Interested Master Student (Semester Project)
Supervision: Pirmin Vogel, Andreas Kurth, Andrea Marongiu

Character

10% Theory, Algorithms and Simulation
40% C programming, Linux kernel hacking
50% C programming, compiler and user-space runtime modifications

Prerequisites

C
Embedded Linux experience
Experience with Linux kernel-level driver development and GNU Compiler Collection (GCC) hacking is of advantage, but not strictly required.

Professor

Luca Benini

References

  1. P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs", Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'15), Amsterdam, The Netherlands, 2015. link
  2. P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs", IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7, 2017. link
  3. P. Vogel, A. Kurth, J. Weinbuch, A. Marongiu, L. Benini, "Efficient Virtual Memory Sharing via On-Accelerator Page Table Walking in Heterogeneous Embedded SoCs", To appear in ESWEEK special issue of ACM Transactions on Embedded Computer Systems
  4. PULPonFPGA: Lightweight Virtual Memory Support - Multi-Level TLB, Project Description link
  5. M. Gorman, "Huge pages", LWN article link
  6. "libhugetlbfs", Software library link
  7. J. Corbet, "Transparent huge pages", LWN article link
  8. M. Nazarewicz, "A deep dive into CMA", LWN article link
  9. P. Vogel, "PULPonFPGA Howto", DZ EDA Wiki entry link

↑ top