Personal tools

BigPULP: Multicluster Synchronization Extensions

From iis-projects

Revision as of 11:33, 17 July 2017 by Vogelpi (talk | contribs) (Created page with "thumb|800px ==Intro== With the PULP platform, the IIS actively develops a manycore platform for ultra-low power operation and leading-edge energy-efficie...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


With the PULP platform, the IIS actively develops a manycore platform for ultra-low power operation and leading-edge energy-efficieny together with the EEES group of UNIBO. This co-operation has already led to a substantial number of tape outs of various single-cluster PULP configurations in multiple technology nodes [1]. A key component enabling high energy-efficiency is the event unit, i.e., a highly versatile, programmable, cluster-internal module that

Besides ultra-low power operation, the project also aims at designing a platform that is highly scalable and features widely-tunable performance, e.g., to use use PULP as a high-performance parallel accelerator in heterogeneous systems. To this end, we have recently set up operation of our bigPULP evaluation platform based on the Juno ARM Development Platform [2]. This system combines a modern ARMv8 multicluster CPU with a Xilinx Virtex-7 XC7V2000T FPGA [3] capable of implementing PULP [1] with 4 to 8 clusters and a total of 32 to 64 cores.

Short Description

The two subsystems are on two separate chips and connected through a high-bandwidth off-chip interface. To reduce the traffic on this interface, PULP can be equipped with an L2 cache memory that caches frequent accesses to the shared main memory. This also allows to reduce the stress on our lightweight virtual memory solution. While the overall platform is of considerable complexity, your job is well defined and isolated: The idea of this project is to develop the hardware IP of this L2 cache.

You can start from an existing IP block designed in a previous project. This IP is equipped with proprietary interfaces and needs to be adapted to use the widely adopted AXI4 protocol [6], which has built-in cache control signals that shall be used by your new IP. Besides designing the IP, your job is also to develop a set of testbenches to verify the functionality and analyze the design. The work primarily targets the implementation on the Xilinx Virtex-7 FPGA but if desired, an ASIC back-end design can also be implemented.

Status: Available

Looking for Interested Master Students (Semester and Master Project)
Supervision: Pirmin Vogel, Andreas Kurth, Andrea Marongiu, Florian Glaser, Germain Haugou


20% Theory, Algorithms and Simulation
50% VHDL, FPGA/ASIC Design
30% Verification


VHDL/System Verilog, C


Luca Benini


  1. PULP link
  2. Juno ARM Development Platform link
  3. Xilinx Virtex-7 XC7V2000T FPGA link
  1. P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs", Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'15), Amsterdam, The Netherlands, 2015. link
  2. P. Vogel, A. Marongiu, L. Benini, "Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs", IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7, 2017. link
  3. Juno ARM Development Platform link
  4. Xilinx Virtex-7 XC7V2000T FPGA link
  5. AMBA 4 AXI Protocol Specifications link
  6. D. J. Sorin, M. D. Hill, D. A. Wood, "A Primer on Memory Consistency and Cache Coherence", Morgan & Claypool Publishers, 2011. link

↑ top