Difference between revisions of "Optimizing the Pipeline in our Floating Point Architectures (1S)"
From iis-projects
Lbertaccini (talk | contribs) |
Lbertaccini (talk | contribs) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 7: | Line 7: | ||
[[Category:2022]] | [[Category:2022]] | ||
[[Category:Semester Thesis]] | [[Category:Semester Thesis]] | ||
− | [[Category: | + | [[Category:Completed]] |
[[Category:Lbertaccini]] | [[Category:Lbertaccini]] | ||
= Overview = | = Overview = | ||
− | == Status: | + | == Status: Completed == |
− | + | * Student Mingrui Yuan | |
* Type: Semester Thesis | * Type: Semester Thesis | ||
* Professor: Prof. Dr. L. Benini | * Professor: Prof. Dr. L. Benini |
Latest revision as of 14:27, 15 May 2023
Contents
Overview
Status: Completed
- Student Mingrui Yuan
- Type: Semester Thesis
- Professor: Prof. Dr. L. Benini
- Supervisors:
Introduction
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. FP architectures usually show a large critical path and need to be pipelined to match the system’s operating frequency. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew [1,2] has been developed at IIS.
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product). Each operation group block (except the DivSqrt module which implements an iterative algorithm) contains a parametrized number of pipeline registers. Currently, all the registers are placed close to the input boundaries, and the timing is optimized during the backend. However, this can make the backend a longer and more complex process. The goal of this project is to manually place the pipeline registers optimizing for timing, and compare them against the baseline implementation.
Project
- Investigation of the FPU timing. This will require you to
- Understand what are the critical paths in the unit
- How the critical paths are broken when inserting different numbers of pipeline registers
- RTL modifications to FPnew to manually optimized the pipeline for different numbers of pipeline registers
- Implementation of a Python generator that takes the number of pipeline levels as an input and places the registers in the position you identified
Character
- 15% Literature / architecture review
- 30% RTL implementation
- 40% Evaluation
- 15% Python generator
Prerequisites
- Strong interest in computer architecture
- Experience with digital design in SystemVerilog as taught in VLSI I
- Experience with ASIC implementation flow (synthesis) as taught in VLSI II
References
[1] https://ieeexplore.ieee.org/abstract/document/9311229 FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing
[2] https://github.com/openhwgroup/cvfpu/