Personal tools

Difference between revisions of "All the flavours of FFT on MemPool (1-2S/B)"

From iis-projects

Jump to: navigation, search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
<!-- Runtime partitioning of L1 memory in Mempool (1-2S/B) -->
+
<!-- All the flavours of FFT on MemPool (1-2S/B) -->
  
 
[[Category:Digital]]
 
[[Category:Digital]]
Line 32: Line 32:
 
* You will extend our work on Cooley-Turkey FFT to different radix.
 
* You will extend our work on Cooley-Turkey FFT to different radix.
 
* You will implement and optimize on MemPool and TeraPool other FFT kernels (e.g. six steps FFT).
 
* You will implement and optimize on MemPool and TeraPool other FFT kernels (e.g. six steps FFT).
* You will add hardware extensions to specialize MemPool for the execution of FFT and other key algorithms in the field of wireless communications.
+
* You will add hardware extensions to specialize MemPool for the execution of FFT and other key algorithms in the field of wireless communications. Another option is also the integration of a PULP FFT accelerator [[#ref-Bertaccini|&#91;3&#93;]] in the MemPool Tile.
  
The different FFT implementations will be scientifically benchmarked. A reference could be the FFT generated by Spiral project [[#ref-Spiral|&#91;3&#93;]].
+
The different FFT implementations will be scientifically benchmarked. A reference could be the FFT generated by Spiral project [[#ref-Spiral|&#91;4&#93;]].
  
 
== Character ==
 
== Character ==
Line 58: Line 58:
 
<div id="ref-Bertuletti2022">
 
<div id="ref-Bertuletti2022">
 
&#91;2&#93; M. Bertuletti, Y. Zhang, A. Vanelli-Coralli, and L. Benini, “Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-core Processor”, https://arxiv.org/abs/2210.09196
 
&#91;2&#93; M. Bertuletti, Y. Zhang, A. Vanelli-Coralli, and L. Benini, “Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-core Processor”, https://arxiv.org/abs/2210.09196
 +
 +
<div id="refs" class="references">
 +
<div id="ref-Bertaccini">
 +
&#91;3&#93; L. Bertaccini, L. Benini and F. Conti, "To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters," 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2021, pp. 1-8, doi: 10.1109/ASAP52443.2021.00008
  
 
<div id="refs" class="references">
 
<div id="refs" class="references">
 
<div id="ref-Spiral">
 
<div id="ref-Spiral">
&#91;3&#93; https://www.spiral.net/hardware/dftgen.html
+
&#91;4&#93; https://www.spiral.net/hardware/dftgen.html

Latest revision as of 17:54, 9 November 2022


Overview

Status: Available

Introduction

MemPool [1] is a IIS-born many-core system, having 256 Snitch cores and 1024 banks of shared tightly coupled L1 data-memory. Leveraging its hierarchical architecture, we can scale the system to TeraPool, a cluster of 1024 Snitch cores, having 4096 banks of shared memory. The huge parallel computing power and the small latency cost of the shared memory accesses in TeraPool suit perfectly the purpose of accelerating embarrassingly parallel tasks, such as matrix-matrix multiplication. Things get more tricky with kernels having irregular memory accesses, such as the Fast Fourier Transform.

In the framework of a poject were MemPool accelerates the workload of 5G processing, we already implemented a performant version of Cooley-Turkey FFT [2], and we are now looking into different algorithmic strategies to execute up to 128 FFT tasks in less than 0.5ms.

Project

The goal of this project is to implement and optimize different FFT kernels:

  • You will extend our work on Cooley-Turkey FFT to different radix.
  • You will implement and optimize on MemPool and TeraPool other FFT kernels (e.g. six steps FFT).
  • You will add hardware extensions to specialize MemPool for the execution of FFT and other key algorithms in the field of wireless communications. Another option is also the integration of a PULP FFT accelerator [3] in the MemPool Tile.

The different FFT implementations will be scientifically benchmarked. A reference could be the FFT generated by Spiral project [4].

Character

  • 10% Literature Review
  • 50% Software Design
  • 20% Hardware Design
  • 20% Evaluation & Documentation

Prerequisites

  • Strong interest in computer architecture and signal processing
  • Experience in C/C++ programming
  • Experience with digital design in SystemVerilog as taught in VLSI I is appreciated

References

[1] M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, “MemPool: A shared-L1 memory many-core cluster with a low-latency interconnect,” in 2021 design, automation, and test in europe conference and exhibition (date), 2021, pp. 701–706.

[2] M. Bertuletti, Y. Zhang, A. Vanelli-Coralli, and L. Benini, “Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-core Processor”, https://arxiv.org/abs/2210.09196

[3] L. Bertaccini, L. Benini and F. Conti, "To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters," 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2021, pp. 1-8, doi: 10.1109/ASAP52443.2021.00008