Personal tools

Difference between revisions of "Fault-Tolerant Floating-Point Units (M)"

From iis-projects

Jump to: navigation, search
(Created page with "<!-- Fault-Tolerant Floating-Point Units (M) --> Category:Digital Category:Acceleration_and_Transprecision Category:High Performance SoCs Category:Computer Arch...")
 
 
(6 intermediate revisions by 2 users not shown)
Line 4: Line 4:
 
[[Category:Acceleration_and_Transprecision]]
 
[[Category:Acceleration_and_Transprecision]]
 
[[Category:High Performance SoCs]]
 
[[Category:High Performance SoCs]]
 +
[[Category:Fault Tolerance]]
 +
[[Category:HW/SW Safety and Security]]
 
[[Category:Computer Architecture]]
 
[[Category:Computer Architecture]]
 
[[Category:2023]]
 
[[Category:2023]]
 
[[Category:Master Thesis]]
 
[[Category:Master Thesis]]
[[Category:Available]]
+
[[Category:In progress]]
 
[[Category:Lbertaccini]]
 
[[Category:Lbertaccini]]
 +
[[Category:Michaero]]
  
 
= Overview =
 
= Overview =
  
== Status: Available==
+
== Status: In Progress ==
  
 
* Type: Master Thesis
 
* Type: Master Thesis
Line 18: Line 21:
 
* Supervisors:
 
* Supervisors:
 
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]
 
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]
 +
** [[:User:Michaero | Michael Rogenmoser]]: [mailto:michaero@iis.ee.ethz.ch michaero@iis.ee.ethz.ch]
  
 
= Introduction =
 
= Introduction =
Line 23: Line 27:
  
  
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduce the model's memory footprint and open new opportunities to increase the system's energy efficiency.
+
Fault-tolerant features are crucial in critical and hostile environments (automotive, space, …). In the PULP group, we have started developing reliable hardware designed for use in space, where high levels of radiation have a significant impact on the correctness of executions.
  
Hardware support for low-precision FP formats (down to 8 bits) is already available in the FP unit (FPU) developed at IIS [1], [2]. The goal of this project is to explore less-than-8b FP formats with a particular emphasis on shared-exponent MiniFloats [3].
+
While many processing elements and memory elements have been investigated and protected, fault-tolerant floating-point units (FPUs) still need to be researched.
 +
The goal of this project is to enhance the FPU developed at IIS [1] with fault-tolerant features (such as redundancy schemes [2].) For example, a fault-tolerant mode will be investigated, where multiple SIMD units inside the FPU will be used to compute the same operation; the multiple results will then be compared to detect faults.
  
 
== Character ==
 
== Character ==
Line 41: Line 46:
 
= References =
 
= References =
  
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores
+
[1] https://github.com/pulp-platform/cvfpu
  
[2] https://github.com/pulp-platform/cvfpu
+
[2] https://arxiv.org/abs/2303.08706
 
 
[3] https://arxiv.org/abs/2310.10537
 

Latest revision as of 14:23, 27 February 2024


Overview

Status: In Progress

Introduction

FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.


Fault-tolerant features are crucial in critical and hostile environments (automotive, space, …). In the PULP group, we have started developing reliable hardware designed for use in space, where high levels of radiation have a significant impact on the correctness of executions.

While many processing elements and memory elements have been investigated and protected, fault-tolerant floating-point units (FPUs) still need to be researched. The goal of this project is to enhance the FPU developed at IIS [1] with fault-tolerant features (such as redundancy schemes [2].) For example, a fault-tolerant mode will be investigated, where multiple SIMD units inside the FPU will be used to compute the same operation; the multiple results will then be compared to detect faults.

Character

  • 20% Literature / architecture review
  • 40% RTL implementation
  • 40% Evaluation

Prerequisites

  • Strong interest in computer architecture
  • Experience with digital design in SystemVerilog as taught in VLSI I
  • Experience with ASIC implementation flow (synthesis) as taught in VLSI II

References

[1] https://github.com/pulp-platform/cvfpu

[2] https://arxiv.org/abs/2303.08706