Personal tools

Floating-Point Divide & Square Root Unit for Transprecision

From iis-projects

Revision as of 17:09, 29 July 2020 by Smach (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Traditional computing systems operate in a precise, "every bit correct" manner. When it comes to computing workloads using arithmetic for example, most programmers tend carry out all computations at the maximum available precision - double precision in the case of floating-point (FP)workloads. This happens regardless of the actual numerical requirements of the application at hand. Transprecision computing aims at doing away with this rigid way of computing by adding more 'knobs' in hardware and software that can be used to adjust computing precision on the fly. In contrast to approximate computing where the precision of the entire system is reduced - often incurring loss in result quality - transprecision computing entails dynamically providing the precision needed for a correct execution. In floating-point arithmetic, this gain in energy efficiency and speed can be achieved by using custom precision fromats that use fewer bits than the standard 'single' and 'double' precision, leading to smaller and more efficient hardware. Using reduced precision floating-point arithmetic is interesting for classic video and audio processing, but also machine learning and scientific computing workloads.

At the Integrated Systems Laboratory (IIS) we have been working for several years on ultra-low-power processor cores in the context of the PULP (Parallel Ultra-Low Power) project. PULP cores implement the open-source RISC-V instruction set archictechture (ISA), which includes FP instructions as optional ISA extensions. RISC-V allows for custom ISA extensions which were used to define transprecision floating-point extensions for the use in PULP cores. Our extensions define 16-bit and 8-bit FP formats, and various operations on these formats, including single-instruction-multiple-data (SIMD) vectors.

Project description

In order for FP arithmetic being fast and energy-efficient in a processor core, a dedicated floating-point unit (FPU) in hardware is needed. RISC-V defines basic arithmetic floating-point operations such as addition, multiplication and division. While addition and multiplication in hardware are quite straight-forward, implementing a division unit is more tricky. There are several architectural options, each with different trade-offs in terms of area, power and latency.

The basic arithmetic operations are also needed for our custom transprecision formats in PULP. Today, there is a transprecision-enabled FPU in PULP that offers support for nearly all operations needed to comply with the RISC-V specifications as well as our own extensions. However, the division and sqare-root (DIV/SQRT) unit is currently non-parametrizable and not fully compliant with the IEEE standard for FP arithmetic. For the purpose of enabling diverse PULP-based systems with various different features such as 32/64-bit cores with and without transprecision capabilities, a flexible and parametric FP DIV/SQRT unit with multi-format capablilities is needed.

In this project, you will evaluate different algorithms for a DIV/SQRT unit in hardware, and implement the most promising one into a transprecision-capable unit. Furthermore, you will take your design through most of the steps necessary for manufacturing it on an actual IC to obtain accurate energy-efficiency and performance metrics. Also, you will be able to test your unit within one of our cores. This new unit will be used inside future versions of our cores to leverage its transprecision capabilities, as well as standard operations.

Outcomes and Acquired Expertise

With this project you will work in a field of active research to help developing a transprecision-enabled platform for ASIC and FPGA targets. You will learn:

  • about computer arithmetics first hand by diving into algorithms;
  • how to design a hardware module for integrating it within a more complex platform, using EDA tools for verification and RTL synthesis to evaluate results;
  • how to take a synthesized design through the back-end EDA flow to prepare it for manufacturing and obtaining power simulation measurements.

Required Skills

To work on this project, you will need:

  • to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed (or actively following during the project) the VLSI1 / VLSI2 courses is strongly recommended
  • to have some prior knowedge of hardware design and architectures

Other skills that you might find useful include:

  • familiarity with computer arithmetics

If you want to work on this project, but you think that you do not match some the required skills, we can provide you with some preliminary exercise to help you fill in the gap.

Status: Completed

Supervision: Stefan Mach, Gianna Paulin


Luca Benini

↑ top

Practical Details

Meetings & Presentations

The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.

Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [1].

At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.


  • The EDA wiki with lots of information on the ETHZ ASIC design flow (internal only) [2]
  • The IIS/DZ coding guidelines [3]

↑ top