Personal tools

Vector-based Parallel Programming Optimization of Communication Algorithm (1-2S/B)

From iis-projects

Revision as of 11:30, 31 October 2023 by Yiczhang (talk | contribs) (Project)
Jump to: navigation, search


Overview

Status: In Progress

Introduction

Flexible and scalable solutions will be needed for future communications processing systems. Vector processors provide an efficient means of exploiting data-level parallelism (DLP), which is heavily present in communications kernels. The Spatz [1], a small and energy-efficient vector unit based on the RISC-V vector extension specification is introduced for efficiency and performance improvement. Spatz lean Processing Element (PE) acts as an accelerator to a scalar core, which is a good candidate for achieving ideal hardware utilization and enabling scalability. Based on these metrics, we implemented Spatz on the TeraPool architecture as our hardware platform, a scaled-up system from MemPool [2], which has 1024 Snitch cores and 4096 banks of shared tightly coupled L1 data-memory. In this project, we will exhibit considerable DLP and implement the typical kernels of baseband signal processing tasks [3].

Project

This project aims to improve the performance and utilization of key kernels from baseband signal processing by vector-based SIMD parallel programming and find out the best efficiency by exploring different vector hardware configurations. The project is divided into three parts:

  • Part One: Familiarization and Interface Alignment

In the initial phase of the project, your primary objective will be to acquaint yourself with the Spatz-based many-core cluster architecture. Your tasks will include:

Aligning the accelerator interface between the latest 64-bit Spatz Cluster, MemPool, and TeraPool-based Spatz. Testing the Matrix Multiplication (MatMul) kernel on both TeraPool and MemPool. Conducting a comprehensive analysis to identify the reasons behind TeraPool-Spatz's inferior performance compared to MemPool Spatz, considering scenarios with and without Matrix Unit (MXU). Pinpoint areas for potential improvement.

  • Part Two: Architecture Evaluation and RTL Work

The second segment of the project will be dedicated to architecture evaluation, with a specific focus on Register-Transfer Level (RTL) work. You will:

Investigate the number of memory ports allocated to Spatz, aiming to enhance its competitiveness in comparison to SSR.

  • Part Three: Vectorized Programming for 5G Communication Algorithms and Physical Feasible Design
In the final part of the project, your attention will shift to vectorized programming for 5G communication algorithms. Your responsibilities will include:
Vectorizing the Physical Uplink Shared Channel (PUSCH) kernels and comparing the results with integer implementations.
Conducting a physical design analysis for both the Integer and Floating Point Unit (FPU) versions of TeraPool-Spatz to assess the Power, Performance, and Area (PPA) impact.
Comparing the results with those obtained from integer-TeraPool and FPU-TeraPool to draw meaningful conclusions.

In this project, you will touch both the software and hardware of vector-based manycore SIMD architecture, creating and executing concurrently signal-processing kernels typically used in the field of baseband communication.

Character

  • 10% Literature Review
  • 40% Software Design
  • 40% Hardware Design
  • 10% Evaluation & Documentation

Weekly Reports

The student is required to write a weekly report at the end of each week and send it to his advisors by email. The weekly report aims to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points. For software programming benchmarks, we strongly recommend creating a google-sheet and plotting the results to trace your benchmark results.

Report

Documentation is an essential and often overlooked aspect of engineering. A final report has to be completed within this project.

The common language of engineering is English. Therefore, the final report of the work is preferred to be written in English.

Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff. If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded here.


Prerequisites

  • Strong interest in manycore computer architecture and memory systems
  • Knowledge of vector architecture
  • Experience in C/C++ programming
  • Experience with digital design in SystemVerilog as taught in VLSI I is appreciated

References

[1] M. Cavalcante, D. Wüthrich, M. Perotti and et al, "Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters," arXiv preprint arXiv:2207.07970 (2022), https://arxiv.org/abs/2207.07970?context=cs

[2] M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, “MemPool: A shared-L1 memory many-core cluster with a low-latency interconnect,” in 2021 design, automation, and test in europe conference and exhibition (date), 2021, pp. 701–706.

[3] M. Bertuletti, Y. Zhang, A. Vanelli-Coralli, and L. Benini, “Efficient Parallelization of 5G-PUSCH on a Scalable RISC-V Many-core Processor”, https://arxiv.org/abs/2210.09196