Personal tools

Difference between revisions of "BigPULP: Multicluster Synchronization Extensions"

From iis-projects

Jump to: navigation, search
(Created page with "thumb|800px ==Intro== With the PULP platform, the IIS actively develops a manycore platform for ultra-low power operation and leading-edge energy-efficie...")
 
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[File:Pulpjuno.png|thumb|800px]]
+
[[File:BigPULP.png|thumb|800px]]
 
==Intro==
 
==Intro==
With the PULP platform, the IIS actively develops a manycore platform for ultra-low power operation and leading-edge energy-efficieny together with the EEES group of UNIBO. This co-operation has already led to a substantial number of tape outs of various single-cluster PULP configurations in multiple technology nodes [1]. A key component enabling high energy-efficiency is the event unit, i.e., a highly versatile, programmable, cluster-internal module that  
+
With the PULP platform, the IIS actively develops a manycore platform for ultra-low power operation and leading-edge energy-efficieny together with the EEES group of UNIBO. This co-operation has already led to a substantial number of tape outs of various single-cluster PULP configurations in multiple technology nodes [1]. A key component allowing for high energy-efficiency is the Event Unit Flex, i.e., a highly versatile, programmable, cluster-internal module that enables the fast synchronization of cores, peripherals and DMA engines using barriers, mutexes, job dispatching, interrupts and events with low overhead only. 
  
Besides ultra-low power operation, the project also aims at designing a platform that is highly scalable and features widely-tunable performance, e.g., to use use PULP as a high-performance parallel accelerator in heterogeneous systems. To this end, we have recently set up operation of our bigPULP evaluation platform based on the Juno ARM Development Platform [2]. This system combines a modern ARMv8 multicluster CPU with a Xilinx Virtex-7 XC7V2000T FPGA [3] capable of implementing PULP [1] with 4 to 8 clusters and a total of 32 to 64 cores.  
+
Besides ultra-low power operation, the PULP project also aims at designing a platform that is highly scalable and features widely-tunable performance, e.g., to use use PULP as a high-performance parallel accelerator in heterogeneous systems. To this end, we also study the seamless integration of PULP-like, programmable many-core accelerators into embedded heterogeneous SoCs built around an ARM Cortex-A like multicore CPU (the host). Within the project, we have developed
 +
: a multi-ISA compile toolchain automating the offloading of highly-parallel OpenMP function kernels from the host CPU to the accelerator [2], and
 +
: a mixed hardware/software solution enabling lightweight, shared virtual memory (SVM) between host CPU and accelerator [3,4],
 +
both dramatically simplifying the programmability of such a heterogeneous system.
  
 
==Short Description==
 
==Short Description==
The two subsystems are on two separate chips and connected through a high-bandwidth off-chip interface. To reduce the traffic on this interface, PULP can be equipped with an L2 cache memory that caches frequent accesses to the shared main memory. This also allows to reduce the stress on our lightweight virtual memory solution. While the overall platform is of considerable complexity, your job is well defined and isolated: The idea of this project is to develop the hardware IP of this L2 cache.  
+
Recently, we have set up operation of our new bigPULP evaluation platform [5] based on the Juno ARM Development Platform [6]. This system combines a modern ARMv8 multicluster CPU with a Xilinx Virtex-7 XC7V2000T FPGA [7] capable of implementing PULP [1] with 4 to 8 clusters and a total of 32 to 64 cores.
  
You can start from an existing IP block designed in a previous project. This IP is equipped with proprietary interfaces and needs to be adapted to use the widely adopted AXI4 protocol [6], which has built-in cache control signals that shall be used by your new IP. Besides designing the IP, your job is also to develop a set of testbenches to verify the functionality and analyze the design. The work primarily targets the implementation on the Xilinx Virtex-7 FPGA but if desired, an ASIC back-end design can also be implemented.  
+
While the current implementation of the Event Unit Flex and the corresponding software infrastructure support basic synchronization across cluster boundaries, it is highly optimized for intra-cluster operation. The goal of this project is to extend the current infrastructure, e.g., with a SoC message passing framework, to facilitate and improve the synchronization among multiple clusters.
  
===Status: Available ===
+
While the overall platform is of considerable complexity, the exact task description can be tailored to match your background knowledge and needs before the project starts: Depending on your skills, the focus can be more on the soft- or hardware side. Just make sure to contact us in advance!
: Looking for Interested Master Students (Semester and Master Project)
+
 
: Supervision: [[:User:Vogelpi|Pirmin Vogel]], [[:User:Akurth|Andreas Kurth]], [[:User:Mandrea|Andrea Marongiu]], [[:User:Glaserf|Florian Glaser]], [[:User:Haugoug|Germain Haugou]]
+
===Status: Completed ===
 +
: Semester thesis by Merkourios Katsimpris
 +
: Supervision: [[:User:Vogelpi|Pirmin Vogel]], [[:User:Akurth|Andreas Kurth]], [[:User:Glaserf|Florian Glaser]], [[:User:Haugoug|Germain Haugou]]
  
 
===Character===
 
===Character===
 
: 20% Theory, Algorithms and Simulation
 
: 20% Theory, Algorithms and Simulation
: 50% VHDL, FPGA/ASIC Design
+
: 50% Implementation (VHDL, FPGA/ASIC Design, C)
 
: 30% Verification
 
: 30% Verification
  
 
===Prerequisites===
 
===Prerequisites===
: VLSI I,
 
 
: VHDL/System Verilog, C
 
: VHDL/System Verilog, C
 +
: VLSI 1 (if the focus shall be on the hardware side)
 +
: Note: The details of the project need to be discussed in advance to set up a task description matching your skills and interests.
  
 
===Professor===
 
===Professor===
Line 33: Line 39:
 
===References===
 
===References===
 
# PULP [http://iis-projects.ee.ethz.ch/index.php/PULP link]
 
# PULP [http://iis-projects.ee.ethz.ch/index.php/PULP link]
# Juno ARM Development Platform [http://www.arm.com/products/tools/development-boards/versatile-express/juno-arm-development-platform.php link]
+
# A. Capotondi, A. Marongiu, "Enabling Zero-Copy OpenMP Offloading on the PULP Many-Core Accelerator", ''Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17)'', Sankt Goar, Germany, 2017 [http://dl.acm.org/citation.cfm?id=3079071 link]  
# Xilinx Virtex-7 XC7V2000T FPGA [http://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf link]
 
 
 
# P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs", ''Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'15)'', Amsterdam, The Netherlands, 2015. [http://dl.acm.org/citation.cfm?id=2830846 link]
 
 
# P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs", ''IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7'', 2017. [http://ieeexplore.ieee.org/document/7797491/ link]
 
# P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs", ''IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7'', 2017. [http://ieeexplore.ieee.org/document/7797491/ link]
 +
# P. Vogel, A. Kurth, J. Weinbuch, A. Marongiu, L. Benini, "Efficient Virtual Memory Sharing via On-Accelerator Page Table Walking in Heterogeneous Embedded SoCs", ''To appear in ESWEEK special issue of ACM Transactions on Embedded Computing Systems''
 +
# P. Vogel, "PULPonFPGA Howto", ''DZ EDA Wiki entry'' [http://eda.ee.ethz.ch/index.php/PULPonFPGA_Howto link]
 
# Juno ARM Development Platform [http://www.arm.com/products/tools/development-boards/versatile-express/juno-arm-development-platform.php link]
 
# Juno ARM Development Platform [http://www.arm.com/products/tools/development-boards/versatile-express/juno-arm-development-platform.php link]
 
# Xilinx Virtex-7 XC7V2000T FPGA [http://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf link]
 
# Xilinx Virtex-7 XC7V2000T FPGA [http://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf link]
# AMBA 4 AXI Protocol Specifications  [http://www.arm.com/products/system-ip/amba-specifications.php link]
 
# D. J. Sorin, M. D. Hill, D. A. Wood, "A Primer on Memory Consistency and Cache Coherence", ''Morgan & Claypool Publishers'', 2011. [http://dl.acm.org/citation.cfm?id=2028905 link]
 
  
 
[[#top|↑ top]]
 
[[#top|↑ top]]
 
[[Category:Digital]]
 
[[Category:Digital]]
[[Category:Available]]
+
[[Category:Hot]]
 
[[Category:Semester Thesis]]
 
[[Category:Semester Thesis]]
 +
[[Category:Master Thesis]]
 
[[Category:PULP]]
 
[[Category:PULP]]
 
[[Category:System Design]]
 
[[Category:System Design]]
 
[[Category:FPGA]]
 
[[Category:FPGA]]
 
[[Category:Vogelpi]]
 
[[Category:Vogelpi]]
[[Category:Marongiu]]
+
[[Category:Akurth]]
[[Category:PSocrates]]
+
[[Category:Glaserf]]
 +
[[Category:Hercules]]
 +
[[Category:Completed]]
 +
[[Category:Heterogeneous Acceleration Systems]]
 +
[[Category:2017]]
  
 
<!--  
 
<!--  

Latest revision as of 11:17, 22 January 2018

BigPULP.png

Intro

With the PULP platform, the IIS actively develops a manycore platform for ultra-low power operation and leading-edge energy-efficieny together with the EEES group of UNIBO. This co-operation has already led to a substantial number of tape outs of various single-cluster PULP configurations in multiple technology nodes [1]. A key component allowing for high energy-efficiency is the Event Unit Flex, i.e., a highly versatile, programmable, cluster-internal module that enables the fast synchronization of cores, peripherals and DMA engines using barriers, mutexes, job dispatching, interrupts and events with low overhead only.

Besides ultra-low power operation, the PULP project also aims at designing a platform that is highly scalable and features widely-tunable performance, e.g., to use use PULP as a high-performance parallel accelerator in heterogeneous systems. To this end, we also study the seamless integration of PULP-like, programmable many-core accelerators into embedded heterogeneous SoCs built around an ARM Cortex-A like multicore CPU (the host). Within the project, we have developed

a multi-ISA compile toolchain automating the offloading of highly-parallel OpenMP function kernels from the host CPU to the accelerator [2], and
a mixed hardware/software solution enabling lightweight, shared virtual memory (SVM) between host CPU and accelerator [3,4],

both dramatically simplifying the programmability of such a heterogeneous system.

Short Description

Recently, we have set up operation of our new bigPULP evaluation platform [5] based on the Juno ARM Development Platform [6]. This system combines a modern ARMv8 multicluster CPU with a Xilinx Virtex-7 XC7V2000T FPGA [7] capable of implementing PULP [1] with 4 to 8 clusters and a total of 32 to 64 cores.

While the current implementation of the Event Unit Flex and the corresponding software infrastructure support basic synchronization across cluster boundaries, it is highly optimized for intra-cluster operation. The goal of this project is to extend the current infrastructure, e.g., with a SoC message passing framework, to facilitate and improve the synchronization among multiple clusters.

While the overall platform is of considerable complexity, the exact task description can be tailored to match your background knowledge and needs before the project starts: Depending on your skills, the focus can be more on the soft- or hardware side. Just make sure to contact us in advance!

Status: Completed

Semester thesis by Merkourios Katsimpris
Supervision: Pirmin Vogel, Andreas Kurth, Florian Glaser, Germain Haugou

Character

20% Theory, Algorithms and Simulation
50% Implementation (VHDL, FPGA/ASIC Design, C)
30% Verification

Prerequisites

VHDL/System Verilog, C
VLSI 1 (if the focus shall be on the hardware side)
Note: The details of the project need to be discussed in advance to set up a task description matching your skills and interests.

Professor

Luca Benini

References

  1. PULP link
  2. A. Capotondi, A. Marongiu, "Enabling Zero-Copy OpenMP Offloading on the PULP Many-Core Accelerator", Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES'17), Sankt Goar, Germany, 2017 link
  3. P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs", IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7, 2017. link
  4. P. Vogel, A. Kurth, J. Weinbuch, A. Marongiu, L. Benini, "Efficient Virtual Memory Sharing via On-Accelerator Page Table Walking in Heterogeneous Embedded SoCs", To appear in ESWEEK special issue of ACM Transactions on Embedded Computing Systems
  5. P. Vogel, "PULPonFPGA Howto", DZ EDA Wiki entry link
  6. Juno ARM Development Platform link
  7. Xilinx Virtex-7 XC7V2000T FPGA link

↑ top