Personal tools

Big Data Analytics Benchmarks for Ara

From iis-projects

Revision as of 16:23, 6 November 2022 by Chizhang (talk | contribs)
Jump to: navigation, search


Status: Available

Introduction

Vector processing is becoming a widespread option when dealing with highly parallel data workloads, thanks to its intrinsic computational capabilities and flexibility. A vector core can sustain high computational throughput using deep pipelines and multiple parallel units.

What a time for a project on a vector processor! RISC-V has almost finished ratifying its open-source vector ISA RVV (a process that lasted many years!), and many industries/universities are producing their first RVV-compatible cores. ETH is at the forefront of this race with its agile in-order vector processor Ara, fresh from an update from the unripe specifications RVV 0.5.

In the age of big data, high performance big data analyzing is demanded. Now, it's the time to leverage the high parallel data computational capabilities of our vector processor Ara on big data analytics! In this project, you will code high performance big data analytics benchmarks based on open-source vector ISA RVV 0.5, evaluate them on vector processor Ara, and try to achieve the best performance.


Tasks

  • Familiarize yourself with vector processor Ara
    • Try to run Ara RTL simulation
    • Executing existing benchmarks
    • Understand how vector processor works and the chaining techneque
  • Familiarize yourself with a bunch of popular big data analytics worksloads, including:
    • Naive Bayes
    • SVM
    • K-means clustering
    • Breadth-first search
    • Depth-first search
    • Multilayer perceptron,
    • Graph neural network
  • Coding for big data analytics benchmarks for Ara, while think about:
    • How to vectorize these workloads
    • How to schedule memory access and computation to make best advantage of vector chaining and reach high function unit utilization
  • Evaluating big data analytics benchmarks
    • Run you benchmarks on Ara and count performance metrics, function unit utilization, bandwidth, bus utilization, etc.
    • Make roofline model, while varing data set size and Ara lane counts
  • Write a report and prepare a presentation.
  • Possible BONUS goals.


Requirements

  • Strong interest and basic knowledge in computer architecture and operating systems, both on the HW and SW sides
  • Experience with SystemVerilog HDL, such as taught in VLSI I
  • Knowledge of bare-metal C and assembly programming
  • Bonus: being familiar with vector processors, RISC-V RVV

Character

  • 25% Literature / Architecture review
  • 50% Bare-metal C and Assembly programming
  • 25% Performance evaluation


References

[1] Ara: https://arxiv.org/pdf/1906.00478.pdf

[2] Ara source code: https://github.com/pulp-platform/ara

[3] RVV: https://github.com/riscv/riscv-v-spec/releases/tag/v1.0

[4] Big data analytics: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-015-0030-3

[5] How AI and ML Applications Will Benefit from Vector Processing: https://www.enterpriseai.news/2020/07/31/how-ai-and-ml-applications-will-benefit-from-vector-processing/

[6] A survey on platforms for big data analytics: https://link.springer.com/article/10.1186/s40537-014-0008-6