Personal tools

Vector-based Parallel Programming Optimization of Communication Algorithm (1-2S/B)

From iis-projects

Jump to: navigation, search


Overview

Status: Available

Introduction

Flexible and scalable solutions will be needed for future communications processing systems. Vector processors provide an efficient means of exploiting data-level parallelism (DLP), which is heavily present in communications kernels. The Spatz [1], a small and energy-efficient vector unit based on the RISC-V vector extension specification is introduced for efficiency and performance improvement. Spatz lean Processing Element (PE) acts as an accelerator to a scalar core, which is a good candidate for achieving ideal hardware utilization and enabling scalability. Based on these metrics, we implemented Spatz on the TeraPool architecture as our hardware platform, a scaled-up system from MemPool [2], which has 1024 Snitch cores and 4096 banks of shared tightly coupled L1 data-memory. In this project, we will exhibit considerable DLP and implement the typical kernels of baseband signal processing tasks [3].

Project

This project aims to improve the performance and utilization of key kernels from baseband signal processing by vector-based SIMD parallel programming and find out the best efficiency by exploring different vector hardware configurations. The project is divided into three parts:

  • Part One: Familiarization and Methodology Study:
 In the initial phase of the project, your primary objective will be to acquaint yourself with the Spatz-based many-core cluster architecture. Your tasks will include:
 
 - Based on the Matrix Multiplication (MatMul) kernel study on TeraPool and MemPool, you will compare and learn the performance changes between integer/vector/MXU-based vector computing.
 - Conducting a comprehensive analysis to identify the reasons behind TeraPool-Spatz's inferior performance compared to MemPool Spatz.
  • Part Two: Vectorized Programming for 5G Communication Algorithms and Physical Feasible Design:
 In the project, your attention will shift to vectorized programming for 5G communication algorithms. Your responsibilities will include:
 
 - Vectorizing the Physical Uplink Shared Channel (PUSCH) kernels and comparing the results with integer implementations.
 - Fully optimize the C kernels that you have implemented, taking into consideration the unique interconnection architecture of MemPool/TeraPool. 
 - Delve into assembly code programming to enhance register utilization, aiming to extract the maximum possible performance from the system.

In this project, you will touch both the software and hardware of vector-based manycore SIMD architecture, creating and executing concurrently signal-processing kernels typically used in the field of baseband communication.

Character

  • 10% Literature Review
  • 60% Software Design
  • 20% Hardware Design
  • 10% Evaluation & Documentation

Weekly Reports

The student is required to write a weekly report at the end of each week and send it to his advisors by email. The weekly report aims to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points. For software programming benchmarks, we strongly recommend creating a google-sheet and plotting the results to trace your benchmark results.

Report

Documentation is an essential and often overlooked aspect of engineering. A final report has to be completed within this project.

The common language of engineering is English. Therefore, the final report of the work is preferred to be written in English.

Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff. If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded here.


Prerequisites

  • Strong interest in manycore computer architecture and memory systems
  • Knowledge of vector architecture
  • Experience in C/C++ programming
  • Experience with digital design in SystemVerilog as taught in VLSI I is appreciated

References

[1] M. Cavalcante, D. Wüthrich, M. Perotti and et al, "Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters," arXiv preprint arXiv:2207.07970 (2022), https://arxiv.org/abs/2207.07970?context=cs

[2] M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, “MemPool: A shared-L1 memory many-core cluster with a low-latency interconnect,” in 2021 design, automation, and test in europe conference and exhibition (date), 2021, pp. 701–706.

[3] M. Bertuletti, Y. Zhang, A. Vanelli-Coralli, and L. Benini, “Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-core Processor”, https://arxiv.org/abs/2210.09196