http://iis-projects.ee.ethz.ch/api.php?action=feedcontributions&user=Lbertaccini&feedformat=atomiis-projects - User contributions [en]2024-03-29T08:21:52ZUser contributionsMediaWiki 1.28.0http://iis-projects.ee.ethz.ch/index.php?title=Hardware_Exploration_of_Shared-Exponent_MiniFloats_(M)&diff=10182Hardware Exploration of Shared-Exponent MiniFloats (M)2024-02-15T13:55:55Z<p>Lbertaccini: </p>
<hr />
<div><!-- Hardware Exploration of Shared-Exponent MiniFloats (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2024]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduces the model's memory footprint and opens new opportunities to increase the system's energy efficiency. While many commercial platforms already provide support for 8-bit FP data types, introducing lower-than-8bit formats is key to facing the memory footprint and efficiency requirements that ever-larger NN models introduce.<br />
<br />
FP unit (FPU) developed at IIS [1], [2] already provide hardware support for low-precision FP formats (down to 8 bits). The goal of this project is to explore less-than-8b FP formats with a particular emphasis on shared-exponent MiniFloats [3]. Such formats use a shared exponent for N less-than-8bit values and are currently being researched by many hardware providers [3].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/pulp-platform/cvfpu<br />
<br />
[3] https://arxiv.org/abs/2310.10537</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Extending_our_FPU_with_Internal_High-Precision_Accumulation_(M)&diff=10181Extending our FPU with Internal High-Precision Accumulation (M)2024-02-15T13:54:07Z<p>Lbertaccini: Created page with "<!-- Extending our FPU with Internal High-Precision Accumulation (M) --> Category:Digital Category:Acceleration_and_Transprecision Category:High Performance SoCs..."</p>
<hr />
<div><!-- Extending our FPU with Internal High-Precision Accumulation (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2024]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduces the model's memory footprint and opens new opportunities to increase the system's energy efficiency. For these reasons, many commercial platforms already provide support for 8-bit FP data types. These formats only provide few mantissa bits and are, therefore, not suited for accumulation. They are instead used in mixed-precision operations, where the accumulation is performed in higher precision, e.g., by using FP16 or FP32.<br />
<br />
FP unit (FPU) developed at IIS [1], [2] already provide hardware support for low-precision FP formats (down to 8 bits). The goal of this project is to add support for internal high-precision accumulation in the FPU. In this way, the accumulated value does not have to be written and read to/from the FP register file at every accumulation, thus requiring low energy. At the same time, this decouples the accumulator size from the register file entry size. The internal accumulators can then have a custom size, potentially even larger than what is offered by one register file entry.<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/pulp-platform/cvfpu</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Hardware_Exploration_of_Shared-Exponent_MiniFloats_(M)&diff=10180Hardware Exploration of Shared-Exponent MiniFloats (M)2024-02-15T13:45:06Z<p>Lbertaccini: </p>
<hr />
<div><!-- Hardware Exploration of Shared-Exponent MiniFloats (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduces the model's memory footprint and opens new opportunities to increase the system's energy efficiency. While many commercial platforms already provide support for 8-bit FP data types, introducing lower-than-8bit formats is key to facing the memory footprint and efficiency requirements that ever-larger NN models introduce.<br />
<br />
FP unit (FPU) developed at IIS [1], [2] already provide hardware support for low-precision FP formats (down to 8 bits). The goal of this project is to explore less-than-8b FP formats with a particular emphasis on shared-exponent MiniFloats [3]. Such formats use a shared exponent for N less-than-8bit values and are currently being researched by many hardware providers [3].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/pulp-platform/cvfpu<br />
<br />
[3] https://arxiv.org/abs/2310.10537</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_an_Open-Source_Double-Precision_Floating-Point_DivSqrt_Unit_into_CVFPU_(1S)&diff=10179Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S)2024-02-15T13:40:43Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Completed]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Type: Semester Thesis (1 or 2 students)<br />
* Student: Roman Marquart<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:FPU_with_DivSqrt.png|thumb|300px|CVFPU block diagram [1]. CVFPU is a modular floating-point unit (FPU) in which each operation group block can be instantiated through a parameter. ]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew (today known as CVFPU) [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product).<br />
<br />
Recently, T-Head open sourced a set of processors. The goal of this project is to evaluate the double-precision DivSqrt unit included in the open-source T-head OpenC910 [3] processor and integrate it into CVFPU.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the T-Head OpenC910 FP DivSqrt module and its fundamental blocks'''<br />
<br />
* '''RTL integration of T-Head OpenC910 FP DivSqrt module into CVFPU'''<br />
<br />
* '''Evaluation of the FP DivSqrt module and the enhanced CVFPU'''<br />
<br />
== Project Breakdown ==<br />
<br />
* 20% Architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] Mach, S., Schuiki, F., Zaruba, F., & Benini, L. (2020). FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29(4), 774-787. (https://ieeexplore.ieee.org/abstract/document/9311229)<br />
<br />
[2] https://github.com/openhwgroup/cvfpu<br />
<br />
[3] https://github.com/T-head-Semi/openc910<br />
<br />
<br />
===Status: Completed ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Hardware_Exploration_of_Shared-Exponent_MiniFloats_(M)&diff=9920Hardware Exploration of Shared-Exponent MiniFloats (M)2023-11-13T17:31:11Z<p>Lbertaccini: </p>
<hr />
<div><!-- Fault-Tolerant Floating-Point Units (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduces the model's memory footprint and opens new opportunities to increase the system's energy efficiency. While many commercial platforms already provide support for 8-bit FP data types, introducing lower-than-8bit formats is key to facing the memory footprint and efficiency requirements that ever-larger NN models introduce.<br />
<br />
FP unit (FPU) developed at IIS [1], [2] already provide hardware support for low-precision FP formats (down to 8 bits). The goal of this project is to explore less-than-8b FP formats with a particular emphasis on shared-exponent MiniFloats [3]. Such formats use a shared exponent for N less-than-8bit values and are currently being researched by many hardware providers [3].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/pulp-platform/cvfpu<br />
<br />
[3] https://arxiv.org/abs/2310.10537</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_an_Open-Source_Double-Precision_Floating-Point_DivSqrt_Unit_into_CVFPU_(1S)&diff=9919Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S)2023-11-13T17:24:57Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:In progress]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Type: Semester Thesis (1 or 2 students)<br />
* Student: Roman Marquart<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:FPU_with_DivSqrt.png|thumb|300px|CVFPU block diagram [1]. CVFPU is a modular floating-point unit (FPU) in which each operation group block can be instantiated through a parameter. ]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew (today known as CVFPU) [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product).<br />
<br />
Recently, T-Head open sourced a set of processors. The goal of this project is to evaluate the double-precision DivSqrt unit included in the open-source T-head OpenC910 [3] processor and integrate it into CVFPU.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the T-Head OpenC910 FP DivSqrt module and its fundamental blocks'''<br />
<br />
* '''RTL integration of T-Head OpenC910 FP DivSqrt module into CVFPU'''<br />
<br />
* '''Evaluation of the FP DivSqrt module and the enhanced CVFPU'''<br />
<br />
== Project Breakdown ==<br />
<br />
* 20% Architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] Mach, S., Schuiki, F., Zaruba, F., & Benini, L. (2020). FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29(4), 774-787. (https://ieeexplore.ieee.org/abstract/document/9311229)<br />
<br />
[2] https://github.com/openhwgroup/cvfpu<br />
<br />
[3] https://github.com/T-head-Semi/openc910<br />
<br />
<br />
===Status: In Progress ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Fault-Tolerant_Floating-Point_Units_(M)&diff=9918Fault-Tolerant Floating-Point Units (M)2023-11-13T17:23:24Z<p>Lbertaccini: </p>
<hr />
<div><!-- Fault-Tolerant Floating-Point Units (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Fault Tolerance]]<br />
[[Category:HW/SW Safety and Security]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
[[Category:Michaero]]<br />
<br />
= Overview =<br />
<br />
== Status: Available==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
** [[:User:Michaero | Michael Rogenmoser]]: [mailto:michaero@iis.ee.ethz.ch michaero@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
<br />
Fault-tolerant features are crucial in critical and hostile environments (automotive, space, …). In the PULP group, we have started developing reliable hardware designed for use in space, where high levels of radiation have a significant impact on the correctness of executions.<br />
<br />
While many processing elements and memory elements have been investigated and protected, fault-tolerant floating-point units (FPUs) still need to be researched.<br />
The goal of this project is to enhance the FPU developed at IIS [1] with fault-tolerant features (such as redundancy schemes [2].) For example, a fault-tolerant mode will be investigated, where multiple SIMD units inside the FPU will be used to compute the same operation; the multiple results will then be compared to detect faults.<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://github.com/pulp-platform/cvfpu<br />
<br />
[2] https://arxiv.org/abs/2303.08706</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Fault-Tolerant_Floating-Point_Units_(M)&diff=9884Fault-Tolerant Floating-Point Units (M)2023-11-03T17:25:15Z<p>Lbertaccini: </p>
<hr />
<div><!-- Fault-Tolerant Floating-Point Units (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
<br />
Fault-tolerant features are crucial in critical and hostile environments (automotive, space, …). The goal of this project is to enhance the FP unit (FPU) developed at IIS [1] with fault-tolerant features (such as redundancy schemes [2]).<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://github.com/pulp-platform/cvfpu<br />
<br />
[2] https://arxiv.org/abs/2303.08706</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Hardware_Exploration_of_Shared-Exponent_MiniFloats_(M)&diff=9883Hardware Exploration of Shared-Exponent MiniFloats (M)2023-11-03T17:22:57Z<p>Lbertaccini: Created page with "<!-- Fault-Tolerant Floating-Point Units (M) --> Category:Digital Category:Acceleration_and_Transprecision Category:High Performance SoCs Category:Computer Arch..."</p>
<hr />
<div><!-- Fault-Tolerant Floating-Point Units (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduce the model's memory footprint and open new opportunities to increase the system's energy efficiency.<br />
<br />
Hardware support for low-precision FP formats (down to 8 bits) is already available in the FP unit (FPU) developed at IIS [1], [2]. The goal of this project is to explore less-than-8b FP formats with a particular emphasis on shared-exponent MiniFloats [3].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/pulp-platform/cvfpu<br />
<br />
[3] https://arxiv.org/abs/2310.10537</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Fault-Tolerant_Floating-Point_Units_(M)&diff=9882Fault-Tolerant Floating-Point Units (M)2023-11-03T17:21:44Z<p>Lbertaccini: Created page with "<!-- Fault-Tolerant Floating-Point Units (M) --> Category:Digital Category:Acceleration_and_Transprecision Category:High Performance SoCs Category:Computer Arch..."</p>
<hr />
<div><!-- Fault-Tolerant Floating-Point Units (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduce the model's memory footprint and open new opportunities to increase the system's energy efficiency.<br />
<br />
Hardware support for low-precision FP formats (down to 8 bits) is already available in the FP unit (FPU) developed at IIS [1], [2]. The goal of this project is to explore less-than-8b FP formats with a particular emphasis on shared-exponent MiniFloats [3].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/pulp-platform/cvfpu<br />
<br />
[3] https://arxiv.org/abs/2310.10537</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Investigating_the_Cost_of_Special-Case_Handling_in_Low-Precision_Floating-Point_Dot_Product_Units_(1S)&diff=9881Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S)2023-11-03T16:20:39Z<p>Lbertaccini: </p>
<hr />
<div><!-- Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Completed]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Student: Maurus Item<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduce the model's memory footprint and open new opportunities to increase the system's energy efficiency.<br />
<br />
A low-precision FP dot product unit was recently developed at IIS [1], [2]. The module computes 8 or 16-bit dot products and accumulates the result in larger precision. It has been designed following the standard IEEE-754 directives. However, supporting all the special cases can be costly in hardware and some of these special cases might be unnecessary for low-precision training. The goal of this project is to evaluate such costs.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the Sdotp unit and its fundamental blocks'''<br />
<br />
* '''RTL modifications to the Sdotp unit'''. Support for special cases will be incrementally removed/modified, and its costs will be assessed.<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/tree/feature/expanding_dotp<br />
<br />
<br />
===Status: Completed===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_an_Open-Source_Double-Precision_Floating-Point_DivSqrt_Unit_into_CVFPU_(1S)&diff=9880Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S)2023-11-03T16:20:12Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:In progress]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Type: Semester Thesis (1 or 2 students)<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:FPU_with_DivSqrt.png|thumb|300px|CVFPU block diagram [1]. CVFPU is a modular floating-point unit (FPU) in which each operation group block can be instantiated through a parameter. ]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew (today known as CVFPU) [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product).<br />
<br />
Recently, T-Head open sourced a set of processors. The goal of this project is to evaluate the double-precision DivSqrt unit included in the open-source T-head OpenC910 [3] processor and integrate it into CVFPU.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the T-Head OpenC910 FP DivSqrt module and its fundamental blocks'''<br />
<br />
* '''RTL integration of T-Head OpenC910 FP DivSqrt module into CVFPU'''<br />
<br />
* '''Evaluation of the FP DivSqrt module and the enhanced CVFPU'''<br />
<br />
== Project Breakdown ==<br />
<br />
* 20% Architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] Mach, S., Schuiki, F., Zaruba, F., & Benini, L. (2020). FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29(4), 774-787. (https://ieeexplore.ieee.org/abstract/document/9311229)<br />
<br />
[2] https://github.com/openhwgroup/cvfpu<br />
<br />
[3] https://github.com/T-head-Semi/openc910<br />
<br />
<br />
===Status: In Progress ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_an_Open-Source_Double-Precision_Floating-Point_DivSqrt_Unit_into_CVFPU_(1S)&diff=9879Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S)2023-11-03T16:19:44Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:In progress]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis (1 or 2 students)<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:FPU_with_DivSqrt.png|thumb|300px|CVFPU block diagram [1]. CVFPU is a modular floating-point unit (FPU) in which each operation group block can be instantiated through a parameter. ]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew (today known as CVFPU) [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product).<br />
<br />
Recently, T-Head open sourced a set of processors. The goal of this project is to evaluate the double-precision DivSqrt unit included in the open-source T-head OpenC910 [3] processor and integrate it into CVFPU.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the T-Head OpenC910 FP DivSqrt module and its fundamental blocks'''<br />
<br />
* '''RTL integration of T-Head OpenC910 FP DivSqrt module into CVFPU'''<br />
<br />
* '''Evaluation of the FP DivSqrt module and the enhanced CVFPU'''<br />
<br />
== Project Breakdown ==<br />
<br />
* 20% Architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] Mach, S., Schuiki, F., Zaruba, F., & Benini, L. (2020). FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29(4), 774-787. (https://ieeexplore.ieee.org/abstract/document/9311229)<br />
<br />
[2] https://github.com/openhwgroup/cvfpu<br />
<br />
[3] https://github.com/T-head-Semi/openc910<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_an_Open-Source_Double-Precision_Floating-Point_DivSqrt_Unit_into_CVFPU_(1S)&diff=9878Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S)2023-11-03T16:19:08Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:In Progress]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis (1 or 2 students)<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:FPU_with_DivSqrt.png|thumb|300px|CVFPU block diagram [1]. CVFPU is a modular floating-point unit (FPU) in which each operation group block can be instantiated through a parameter. ]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew (today known as CVFPU) [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product).<br />
<br />
Recently, T-Head open sourced a set of processors. The goal of this project is to evaluate the double-precision DivSqrt unit included in the open-source T-head OpenC910 [3] processor and integrate it into CVFPU.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the T-Head OpenC910 FP DivSqrt module and its fundamental blocks'''<br />
<br />
* '''RTL integration of T-Head OpenC910 FP DivSqrt module into CVFPU'''<br />
<br />
* '''Evaluation of the FP DivSqrt module and the enhanced CVFPU'''<br />
<br />
== Project Breakdown ==<br />
<br />
* 20% Architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] Mach, S., Schuiki, F., Zaruba, F., & Benini, L. (2020). FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29(4), 774-787. (https://ieeexplore.ieee.org/abstract/document/9311229)<br />
<br />
[2] https://github.com/openhwgroup/cvfpu<br />
<br />
[3] https://github.com/T-head-Semi/openc910<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Investigating_the_Cost_of_Special-Case_Handling_in_Low-Precision_Floating-Point_Dot_Product_Units_(1S)&diff=9877Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S)2023-11-03T16:18:40Z<p>Lbertaccini: </p>
<hr />
<div><!-- Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Completed]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Maurus Item<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduce the model's memory footprint and open new opportunities to increase the system's energy efficiency.<br />
<br />
A low-precision FP dot product unit was recently developed at IIS [1], [2]. The module computes 8 or 16-bit dot products and accumulates the result in larger precision. It has been designed following the standard IEEE-754 directives. However, supporting all the special cases can be costly in hardware and some of these special cases might be unnecessary for low-precision training. The goal of this project is to evaluate such costs.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the Sdotp unit and its fundamental blocks'''<br />
<br />
* '''RTL modifications to the Sdotp unit'''. Support for special cases will be incrementally removed/modified, and its costs will be assessed.<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/tree/feature/expanding_dotp<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_an_Open-Source_Double-Precision_Floating-Point_DivSqrt_Unit_into_CVFPU_(1S)&diff=9128Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S)2023-05-15T15:58:44Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis (1 or 2 students)<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:FPU_with_DivSqrt.png|thumb|300px|CVFPU block diagram [1]. CVFPU is a modular floating-point unit (FPU) in which each operation group block can be instantiated through a parameter. ]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew (today known as CVFPU) [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product).<br />
<br />
Recently, T-Head open sourced a set of processors. The goal of this project is to evaluate the double-precision DivSqrt unit included in the open-source T-head OpenC910 [3] processor and integrate it into CVFPU.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the T-Head OpenC910 FP DivSqrt module and its fundamental blocks'''<br />
<br />
* '''RTL integration of T-Head OpenC910 FP DivSqrt module into CVFPU'''<br />
<br />
* '''Evaluation of the FP DivSqrt module and the enhanced CVFPU'''<br />
<br />
== Project Breakdown ==<br />
<br />
* 20% Architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] Mach, S., Schuiki, F., Zaruba, F., & Benini, L. (2020). FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29(4), 774-787. (https://ieeexplore.ieee.org/abstract/document/9311229)<br />
<br />
[2] https://github.com/openhwgroup/cvfpu<br />
<br />
[3] https://github.com/T-head-Semi/openc910<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_an_Open-Source_Double-Precision_Floating-Point_DivSqrt_Unit_into_CVFPU_(1S)&diff=9127Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S)2023-05-15T12:54:57Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis (1 or 2 students)<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:FPU_with_DivSqrt.png|thumb|300px|CVFOU block diagram [1]. CVFPU is a modular floating-point unit (FPU) in which each operation group block can be instantiated through a parameter. ]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew (today known as CVFPU) [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product).<br />
<br />
Recently, T-Head open sourced a set of processors. The goal of this project is to evaluate the double-precision DivSqrt unit included in the open-source T-head OpenC906 processor and integrate it into CVFPU.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the T-Head OpenC906 FP DivSqrt module and its fundamental blocks'''<br />
<br />
* '''RTL integration of T-Head OpenC906 FP DivSqrt module into CVFPU'''<br />
<br />
* '''Evaluation of the FP DivSqrt module and the enhanced CVFPU'''<br />
<br />
== Project Breakdown ==<br />
<br />
* 20% Architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] Mach, S., Schuiki, F., Zaruba, F., & Benini, L. (2020). FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29(4), 774-787. (https://ieeexplore.ieee.org/abstract/document/9311229)<br />
<br />
[2] https://github.com/openhwgroup/cvfpu<br />
<br />
[3] https://github.com/T-head-Semi/openc906<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_an_Open-Source_Double-Precision_Floating-Point_DivSqrt_Unit_into_CVFPU_(1S)&diff=9126Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S)2023-05-15T12:53:15Z<p>Lbertaccini: Created page with "<!-- Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S) --> Category:Digital Category:Acceleration_and_Transprecision Category:..."</p>
<hr />
<div><!-- Integrating an Open-Source Double-Precision Floating-Point DivSqrt Unit into CVFPU (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:FPU_with_DivSqrt.png|thumb|300px|CVFOU block diagram [1]. CVFPU is a modular floating-point unit (FPU) in which each operation group block can be instantiated through a parameter. ]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew (today known as CVFPU) [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product).<br />
<br />
Recently, T-Head open sourced a set of processors. The goal of this project is to evaluate the double-precision DivSqrt unit included in the open-source T-head OpenC906 processor and integrate it into CVFPU.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the T-Head OpenC906 FP DivSqrt module and its fundamental blocks'''<br />
<br />
* '''RTL integration of T-Head OpenC906 FP DivSqrt module into CVFPU'''<br />
<br />
* '''Evaluation of the FP DivSqrt module and the enhanced CVFPU'''<br />
<br />
== Project Breakdown ==<br />
<br />
* 20% Architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] Mach, S., Schuiki, F., Zaruba, F., & Benini, L. (2020). FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 29(4), 774-787. (https://ieeexplore.ieee.org/abstract/document/9311229)<br />
<br />
[2] https://github.com/openhwgroup/cvfpu<br />
<br />
[3] https://github.com/T-head-Semi/openc906<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=File:FPU_with_DivSqrt.png&diff=9125File:FPU with DivSqrt.png2023-05-15T12:39:03Z<p>Lbertaccini: </p>
<hr />
<div></div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Investigating_the_Cost_of_Special-Case_Handling_in_Low-Precision_Floating-Point_Dot_Product_Units_(1S)&diff=9124Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S)2023-05-15T12:29:38Z<p>Lbertaccini: </p>
<hr />
<div><!-- Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:In progress]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Maurus Item<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduce the model's memory footprint and open new opportunities to increase the system's energy efficiency.<br />
<br />
A low-precision FP dot product unit was recently developed at IIS [1], [2]. The module computes 8 or 16-bit dot products and accumulates the result in larger precision. It has been designed following the standard IEEE-754 directives. However, supporting all the special cases can be costly in hardware and some of these special cases might be unnecessary for low-precision training. The goal of this project is to evaluate such costs.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the Sdotp unit and its fundamental blocks'''<br />
<br />
* '''RTL modifications to the Sdotp unit'''. Support for special cases will be incrementally removed/modified, and its costs will be assessed.<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/tree/feature/expanding_dotp<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Smart_Meters&diff=9123Smart Meters2023-05-15T12:29:25Z<p>Lbertaccini: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Bachelor Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:SmartSensors]]<br />
[[Category:EmbeddedAI]]<br />
[[Category:Low Power Embedded Systems and Wireless Sensors Networks]]<br />
[[Category:In progress]]<br />
[[Category:2023]]<br />
[[Category:Lbertaccini]]<br />
<br />
[[File:smart_meters.png|600px|right|thumb]]<br />
<br />
===Status: In Progress ===<br />
Student: Jiayi Liu<br />
<br />
== Description ==<br />
The Internet of Things (IoT) era is characterized by billions of devices gathering data and sending them to servers, where they can be analyzed and processed. A pre-processing step can also be implemented directly on the IoT device to save energy and bandwidth. Extracting information on the edge allows sending a lighter payload to the server, thus reducing the time spent in transmission.<br />
<br />
<br />
The goal of this project is to implement a low-cost solution to make mechanical meters smart, instead of replacing them with costly devices.<br />
The students will work on a Smart Meter, an IoT system based on:<br />
* GAPuino, a development board based on PULP (Parallel Ultra-Low-Power Processing Platform), developed here at IIS. PULP is an open-source multi-core platform achieving leading-edge energy efficiency and featuring widely-tunable performance<br />
* a modem for wireless connectivity<br />
* an ultra-low-power camera<br />
<br />
The system will periodically wake up, take a picture, process the image extracting the number displayed on the meters and transmit the value wirelessly. A wide range of different meters exists and many of them are located in environments with difficult lighting conditions. Therefore, analyzing the image on the edge will require robust pattern recognition algorithms.<br />
<br />
<br />
A prototype connecting GAPuino with the modem and the low-power camera and able to send messages to a server is already available. During this project, you will train the NN model, deploy it on GAPuino, test the final device and optimize the pipeline for energy efficiency.<br />
<br />
===== Application Scenario ===== <br />
The smart meter will be employed in an IoT scenario. The automatic recognition of the number displayed on the meter and its wireless transmission will replace the need for a person to read the meter and annotate the measurement. <br />
<br />
===== Requirements ===== <br />
<br />
* Familiarity with C and Python programming<br />
* Basic knowledge of communication protocols<br />
<br />
===== Task Description =====<br />
* Training of the NN model for meter detection and recognition<br />
* Deployment of the model on the IoT device<br />
* Testing the system and evaluate the power consumption <br />
* Optimization for energy efficiency<br />
<br />
===== Project Supervisor ===== <br />
* [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Smart_Meters&diff=9122Smart Meters2023-05-15T12:28:04Z<p>Lbertaccini: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Bachelor Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:SmartSensors]]<br />
[[Category:EmbeddedAI]]<br />
[[Category:Low Power Embedded Systems and Wireless Sensors Networks]]<br />
[[Category:In Progress]]<br />
[[Category:2023]]<br />
[[Category:Lbertaccini]]<br />
<br />
[[File:smart_meters.png|600px|right|thumb]]<br />
<br />
===Status: In Progress ===<br />
Student: Jiayi Liu<br />
<br />
== Description ==<br />
The Internet of Things (IoT) era is characterized by billions of devices gathering data and sending them to servers, where they can be analyzed and processed. A pre-processing step can also be implemented directly on the IoT device to save energy and bandwidth. Extracting information on the edge allows sending a lighter payload to the server, thus reducing the time spent in transmission.<br />
<br />
<br />
The goal of this project is to implement a low-cost solution to make mechanical meters smart, instead of replacing them with costly devices.<br />
The students will work on a Smart Meter, an IoT system based on:<br />
* GAPuino, a development board based on PULP (Parallel Ultra-Low-Power Processing Platform), developed here at IIS. PULP is an open-source multi-core platform achieving leading-edge energy efficiency and featuring widely-tunable performance<br />
* a modem for wireless connectivity<br />
* an ultra-low-power camera<br />
<br />
The system will periodically wake up, take a picture, process the image extracting the number displayed on the meters and transmit the value wirelessly. A wide range of different meters exists and many of them are located in environments with difficult lighting conditions. Therefore, analyzing the image on the edge will require robust pattern recognition algorithms.<br />
<br />
<br />
A prototype connecting GAPuino with the modem and the low-power camera and able to send messages to a server is already available. During this project, you will train the NN model, deploy it on GAPuino, test the final device and optimize the pipeline for energy efficiency.<br />
<br />
===== Application Scenario ===== <br />
The smart meter will be employed in an IoT scenario. The automatic recognition of the number displayed on the meter and its wireless transmission will replace the need for a person to read the meter and annotate the measurement. <br />
<br />
===== Requirements ===== <br />
<br />
* Familiarity with C and Python programming<br />
* Basic knowledge of communication protocols<br />
<br />
===== Task Description =====<br />
* Training of the NN model for meter detection and recognition<br />
* Deployment of the model on the IoT device<br />
* Testing the system and evaluate the power consumption <br />
* Optimization for energy efficiency<br />
<br />
===== Project Supervisor ===== <br />
* [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Investigating_the_Cost_of_Special-Case_Handling_in_Low-Precision_Floating-Point_Dot_Product_Units_(1S)&diff=9121Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S)2023-05-15T12:27:39Z<p>Lbertaccini: </p>
<hr />
<div><!-- Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:In Progress]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Maurus Item<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduce the model's memory footprint and open new opportunities to increase the system's energy efficiency.<br />
<br />
A low-precision FP dot product unit was recently developed at IIS [1], [2]. The module computes 8 or 16-bit dot products and accumulates the result in larger precision. It has been designed following the standard IEEE-754 directives. However, supporting all the special cases can be costly in hardware and some of these special cases might be unnecessary for low-precision training. The goal of this project is to evaluate such costs.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the Sdotp unit and its fundamental blocks'''<br />
<br />
* '''RTL modifications to the Sdotp unit'''. Support for special cases will be incrementally removed/modified, and its costs will be assessed.<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/tree/feature/expanding_dotp<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Optimizing_the_Pipeline_in_our_Floating_Point_Architectures_(1S)&diff=9120Optimizing the Pipeline in our Floating Point Architectures (1S)2023-05-15T12:27:10Z<p>Lbertaccini: </p>
<hr />
<div><!-- Optimizing the Pipeline in our Floating Point Architectures (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Completed]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
* Student Mingrui Yuan<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, FPnew was instantiated without a DivSqrt module.]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. FP architectures usually show a large critical path and need to be pipelined to match the system’s operating frequency. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product). Each operation group block (except the DivSqrt module which implements an iterative algorithm) contains a parametrized number of pipeline registers. Currently, all the registers are placed close to the input boundaries, and the timing is optimized during the backend. However, this can make the backend a longer and more complex process. The goal of this project is to manually place the pipeline registers optimizing for timing, and compare them against the baseline implementation.<br />
<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the FPU timing'''. This will require you to <br />
** Understand what are the critical paths in the unit<br />
** How the critical paths are broken when inserting different numbers of pipeline registers<br />
* '''RTL modifications to FPnew''' to manually optimized the pipeline for different numbers of pipeline registers<br />
* '''Implementation of a Python generator''' that takes the number of pipeline levels as an input and places the registers in the position you identified<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 30% RTL implementation<br />
* 40% Evaluation<br />
* 15% Python generator<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/abstract/document/9311229 FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Smart_Meters&diff=9119Smart Meters2023-05-15T12:26:21Z<p>Lbertaccini: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Bachelor Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:SmartSensors]]<br />
[[Category:EmbeddedAI]]<br />
[[Category:Low Power Embedded Systems and Wireless Sensors Networks]]<br />
[[Category:2023]]<br />
[[Category:Lbertaccini]]<br />
<br />
[[File:smart_meters.png|600px|right|thumb]]<br />
<br />
===Status: In Progress ===<br />
Student: Jiayi Liu<br />
<br />
== Description ==<br />
The Internet of Things (IoT) era is characterized by billions of devices gathering data and sending them to servers, where they can be analyzed and processed. A pre-processing step can also be implemented directly on the IoT device to save energy and bandwidth. Extracting information on the edge allows sending a lighter payload to the server, thus reducing the time spent in transmission.<br />
<br />
<br />
The goal of this project is to implement a low-cost solution to make mechanical meters smart, instead of replacing them with costly devices.<br />
The students will work on a Smart Meter, an IoT system based on:<br />
* GAPuino, a development board based on PULP (Parallel Ultra-Low-Power Processing Platform), developed here at IIS. PULP is an open-source multi-core platform achieving leading-edge energy efficiency and featuring widely-tunable performance<br />
* a modem for wireless connectivity<br />
* an ultra-low-power camera<br />
<br />
The system will periodically wake up, take a picture, process the image extracting the number displayed on the meters and transmit the value wirelessly. A wide range of different meters exists and many of them are located in environments with difficult lighting conditions. Therefore, analyzing the image on the edge will require robust pattern recognition algorithms.<br />
<br />
<br />
A prototype connecting GAPuino with the modem and the low-power camera and able to send messages to a server is already available. During this project, you will train the NN model, deploy it on GAPuino, test the final device and optimize the pipeline for energy efficiency.<br />
<br />
===== Application Scenario ===== <br />
The smart meter will be employed in an IoT scenario. The automatic recognition of the number displayed on the meter and its wireless transmission will replace the need for a person to read the meter and annotate the measurement. <br />
<br />
===== Requirements ===== <br />
<br />
* Familiarity with C and Python programming<br />
* Basic knowledge of communication protocols<br />
<br />
===== Task Description =====<br />
* Training of the NN model for meter detection and recognition<br />
* Deployment of the model on the IoT device<br />
* Testing the system and evaluate the power consumption <br />
* Optimization for energy efficiency<br />
<br />
===== Project Supervisor ===== <br />
* [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Investigating_the_Cost_of_Special-Case_Handling_in_Low-Precision_Floating-Point_Dot_Product_Units_(1S)&diff=8308Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S)2022-11-07T11:41:09Z<p>Lbertaccini: </p>
<hr />
<div><!-- Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats, such as 8-bit FP data types, reduce the model's memory footprint and open new opportunities to increase the system's energy efficiency.<br />
<br />
A low-precision FP dot product unit was recently developed at IIS [1], [2]. The module computes 8 or 16-bit dot products and accumulates the result in larger precision. It has been designed following the standard IEEE-754 directives. However, supporting all the special cases can be costly in hardware and some of these special cases might be unnecessary for low-precision training. The goal of this project is to evaluate such costs.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the Sdotp unit and its fundamental blocks'''<br />
<br />
* '''RTL modifications to the Sdotp unit'''. Support for special cases will be incrementally removed/modified, and its costs will be assessed.<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/tree/feature/expanding_dotp<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Optimizing_the_Pipeline_in_our_Floating_Point_Architectures_(1S)&diff=8307Optimizing the Pipeline in our Floating Point Architectures (1S)2022-11-07T11:14:00Z<p>Lbertaccini: </p>
<hr />
<div><!-- Optimizing the Pipeline in our Floating Point Architectures (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, FPnew was instantiated without a DivSqrt module.]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. FP architectures usually show a large critical path and need to be pipelined to match the system’s operating frequency. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product). Each operation group block (except the DivSqrt module which implements an iterative algorithm) contains a parametrized number of pipeline registers. Currently, all the registers are placed close to the input boundaries, and the timing is optimized during the backend. However, this can make the backend a longer and more complex process. The goal of this project is to manually place the pipeline registers optimizing for timing, and compare them against the baseline implementation.<br />
<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the FPU timing'''. This will require you to <br />
** Understand what are the critical paths in the unit<br />
** How the critical paths are broken when inserting different numbers of pipeline registers<br />
* '''RTL modifications to FPnew''' to manually optimized the pipeline for different numbers of pipeline registers<br />
* '''Implementation of a Python generator''' that takes the number of pipeline levels as an input and places the registers in the position you identified<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 30% RTL implementation<br />
* 40% Evaluation<br />
* 15% Python generator<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/abstract/document/9311229 FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Investigating_the_Cost_of_Special-Case_Handling_in_Low-Precision_Floating-Point_Dot_Product_Units_(1S)&diff=8306Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S)2022-11-07T11:13:38Z<p>Lbertaccini: Created page with "<!-- Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S) --> Category:Digital Category:Acceleration_and_Transprecisio..."</p>
<hr />
<div><!-- Investigating the Cost of Special-Case Handling in Low-Precision Floating-Point Dot Product Units (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, the FPU was instantiated without a DivSqrt module.]]<br />
<br />
Low-precision floating-point (FP) formats are getting more and more traction in the context of neural network (NN) training. Employing low-precision formats such as 8-bit FP data type reduces the model's memory footprint and opens new opportunities to increase the system's energy efficiency.<br />
<br />
A low-precision FP dot product unit was recently developed at IIS [1], [2]. The module computes 8 or 16-bit dot products and accumulates the result in larger precision. It has been designed following the standard IEEE-754 directives. However, supporting all the special cases can be costly in hardware and some of these special cases might be unnecessary for low-precision training. The goal of this project is to evaluate such costs.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the Sdotp unit and its fundamental blocks'''<br />
<br />
* '''RTL modifications to the Sdotp unit'''. Support for special cases will be incrementally removed/modified, and its costs will be assessed.<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 40% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://arxiv.org/abs/2207.03192 MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/tree/feature/expanding_dotp<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Optimizing_the_Pipeline_in_our_Floating_Point_Architectures_(1S)&diff=8304Optimizing the Pipeline in our Floating Point Architectures (1S)2022-11-07T10:17:12Z<p>Lbertaccini: </p>
<hr />
<div><!-- Optimizing the Pipeline in our Floating Point Architectures (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:In progress]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, FPnew was instantiated without a DivSqrt module.]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. FP architectures usually show a large critical path and need to be pipelined to match the system’s operating frequency. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product). Each operation group block (except the DivSqrt module which implements an iterative algorithm) contains a parametrized number of pipeline registers. Currently, all the registers are placed close to the input boundaries, and the timing is optimized during the backend. However, this can make the backend a longer and more complex process. The goal of this project is to manually place the pipeline registers optimizing for timing, and compare them against the baseline implementation.<br />
<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the FPU timing'''. This will require you to <br />
** Understand what are the critical paths in the unit<br />
** How the critical paths are broken when inserting different numbers of pipeline registers<br />
* '''RTL modifications to FPnew''' to manually optimized the pipeline for different numbers of pipeline registers<br />
* '''Implementation of a Python generator''' that takes the number of pipeline levels as an input and places the registers in the position you identified<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 30% RTL implementation<br />
* 40% Evaluation<br />
* 15% Python generator<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/abstract/document/9311229 FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Optimizing_the_Pipeline_in_our_Floating_Point_Architectures_(1S)&diff=8303Optimizing the Pipeline in our Floating Point Architectures (1S)2022-11-07T10:16:13Z<p>Lbertaccini: </p>
<hr />
<div><!-- Optimizing the Pipeline in our Floating Point Architectures (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:In Progress]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, FPnew was instantiated without a DivSqrt module.]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. FP architectures usually show a large critical path and need to be pipelined to match the system’s operating frequency. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product). Each operation group block (except the DivSqrt module which implements an iterative algorithm) contains a parametrized number of pipeline registers. Currently, all the registers are placed close to the input boundaries, and the timing is optimized during the backend. However, this can make the backend a longer and more complex process. The goal of this project is to manually place the pipeline registers optimizing for timing, and compare them against the baseline implementation.<br />
<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the FPU timing'''. This will require you to <br />
** Understand what are the critical paths in the unit<br />
** How the critical paths are broken when inserting different numbers of pipeline registers<br />
* '''RTL modifications to FPnew''' to manually optimized the pipeline for different numbers of pipeline registers<br />
* '''Implementation of a Python generator''' that takes the number of pipeline levels as an input and places the registers in the position you identified<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 30% RTL implementation<br />
* 40% Evaluation<br />
* 15% Python generator<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/abstract/document/9311229 FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Smart_Meters&diff=8302Smart Meters2022-11-07T10:14:40Z<p>Lbertaccini: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Available]]<br />
[[Category:Bachelor Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:SmartSensors]]<br />
[[Category:EmbeddedAI]]<br />
[[Category:Low Power Embedded Systems and Wireless Sensors Networks]]<br />
[[Category:2023]]<br />
[[Category:Lbertaccini]]<br />
<br />
[[File:smart_meters.png|600px|right|thumb]]<br />
<br />
===Status: Available ===<br />
<br />
== Description ==<br />
The Internet of Things (IoT) era is characterized by billions of devices gathering data and sending them to servers, where they can be analyzed and processed. A pre-processing step can also be implemented directly on the IoT device to save energy and bandwidth. Extracting information on the edge allows sending a lighter payload to the server, thus reducing the time spent in transmission.<br />
<br />
<br />
The goal of this project is to implement a low-cost solution to make mechanical meters smart, instead of replacing them with costly devices.<br />
The students will work on a Smart Meter, an IoT system based on:<br />
* GAPuino, a development board based on PULP (Parallel Ultra-Low-Power Processing Platform), developed here at IIS. PULP is an open-source multi-core platform achieving leading-edge energy efficiency and featuring widely-tunable performance<br />
* a modem for wireless connectivity<br />
* an ultra-low-power camera<br />
<br />
The system will periodically wake up, take a picture, process the image extracting the number displayed on the meters and transmit the value wirelessly. A wide range of different meters exists and many of them are located in environments with difficult lighting conditions. Therefore, analyzing the image on the edge will require robust pattern recognition algorithms.<br />
<br />
<br />
A prototype connecting GAPuino with the modem and the low-power camera and able to send messages to a server is already available. During this project, you will train the NN model, deploy it on GAPuino, test the final device and optimize the pipeline for energy efficiency.<br />
<br />
===== Application Scenario ===== <br />
The smart meter will be employed in an IoT scenario. The automatic recognition of the number displayed on the meter and its wireless transmission will replace the need for a person to read the meter and annotate the measurement. <br />
<br />
===== Requirements ===== <br />
<br />
* Familiarity with C and Python programming<br />
* Basic knowledge of communication protocols<br />
<br />
===== Task Description =====<br />
* Training of the NN model for meter detection and recognition<br />
* Deployment of the model on the IoT device<br />
* Testing the system and evaluate the power consumption <br />
* Optimization for energy efficiency<br />
<br />
===== Project Supervisor ===== <br />
* [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Smart_Meters&diff=8301Smart Meters2022-11-07T10:13:56Z<p>Lbertaccini: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Available]]<br />
[[Category:Bachelor Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:SmartSensors]]<br />
[[Category:EmbeddedAI]]<br />
[[Category:Low Power Embedded Systems and Wireless Sensors Networks]]<br />
[[Category:2023]]<br />
[[Category:Lbertaccini]]<br />
<br />
[[File:smart_meters.png|600px|right|thumb]]<br />
<br />
===Status: Completed ===<br />
<br />
== Description ==<br />
The Internet of Things (IoT) era is characterized by billions of devices gathering data and sending them to servers, where they can be analyzed and processed. A pre-processing step can also be implemented directly on the IoT device to save energy and bandwidth. Extracting information on the edge allows sending a lighter payload to the server, thus reducing the time spent in transmission.<br />
<br />
<br />
The goal of this project is to implement a low-cost solution to make mechanical meters smart, instead of replacing them with costly devices.<br />
The students will work on a Smart Meter, an IoT system based on:<br />
* GAPuino, a development board based on PULP (Parallel Ultra-Low-Power Processing Platform), developed here at IIS. PULP is an open-source multi-core platform achieving leading-edge energy efficiency and featuring widely-tunable performance<br />
* a modem for wireless connectivity<br />
* an ultra-low-power camera<br />
<br />
The system will periodically wake up, take a picture, process the image extracting the number displayed on the meters and transmit the value wirelessly. A wide range of different meters exists and many of them are located in environments with difficult lighting conditions. Therefore, analyzing the image on the edge will require robust pattern recognition algorithms.<br />
<br />
<br />
A prototype connecting GAPuino with the modem and the low-power camera and able to send messages to a server is already available. During this project, you will train the NN model, deploy it on GAPuino, test the final device and optimize the pipeline for energy efficiency.<br />
<br />
===== Application Scenario ===== <br />
The smart meter will be employed in an IoT scenario. The automatic recognition of the number displayed on the meter and its wireless transmission will replace the need for a person to read the meter and annotate the measurement. <br />
<br />
===== Requirements ===== <br />
<br />
* Familiarity with C and Python programming<br />
* Basic knowledge of communication protocols<br />
<br />
===== Task Description =====<br />
* Training of the NN model for meter detection and recognition<br />
* Deployment of the model on the IoT device<br />
* Testing the system and evaluate the power consumption <br />
* Optimization for energy efficiency<br />
<br />
===== Project Supervisor ===== <br />
* [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=User:Lbertaccini&diff=8300User:Lbertaccini2022-11-07T09:59:43Z<p>Lbertaccini: </p>
<hr />
<div>__NOTOC__<br />
I received my Master's degree in Electronic Engineering from the University of Bologna in 2020. I am currently pursuing a PhD at the Integrated Systems Laboratory (IIS) of ETH Zurich in the Digital Systems group led by Prof. Luca Benini. My research is mainly focused on hardware accelerators, heterogeneous architectures and computer arithmetic.<br />
<br />
<br />
[[File:lbertaccini_photo.jpg|thumb|200px|]]<br />
<br />
==Interests==<br />
My research interests include:<br />
* Hardware accelerators (Floating-Point Unit, DSP/ML accelerators)<br />
* Manycore systems<br />
* Energy-Efficient SoCs <br />
* Heterogenous architecures<br />
<br />
==Luca Bertaccini -- Contact Information==<br />
* '''Office''': ETZ J 78<br />
* '''e-mail''': [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* '''www''': [https://iis.ee.ethz.ch/people/person-detail.MjYzNjU0.TGlzdC8zOTg3LDk5MDE4ODk4MA==.html IIS Homepage]<br />
[[Category:Supervisors]]<br />
[[Category:Digital]]<br />
<br />
<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Lbertaccini<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
== Projects in Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In progress<br />
category = Lbertaccini<br />
</DynamicPageList><br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Lbertaccini<br />
</DynamicPageList></div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_Hardware_Accelerators_into_Snitch_(1S)&diff=8024Integrating Hardware Accelerators into Snitch (1S)2022-09-07T11:57:34Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating Hardware Accelerators into Snitch (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:Energy Efficient SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Lbertaccini]]<br />
[[Category:Prasadar]]<br />
<br />
= Overview =<br />
<br />
== Status: Not Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
** [[:User:Prasadar | Arpan Suravi Prasad ]]: [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster [1] couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]<br />
<br />
[[File:cluster_hwpe.png|thumb|350px|The ''PULP'' cluster including an HWPE [3]]]<br />
<br />
The Snitch system [1] targets energy-efficient high-performance computing. It is built around the tiny RISC-V Snitch integer core, coupled with a large double-precision floating-point unit (FPU) optimized for high-performance. Additionally, Snitch features two custom instruction-set-architecture (ISA) extensions, stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which allows the system to achieve FPU utilization above 90%. <br />
<br />
With the slowdown of Moore's law, increased functionality and/or lower costs are achieved through domain specialization and heterogeneity. This led to an increased interest in domain-specific accelerators, which provide higher energy efficiency at a lower area cost. <br />
<br />
HWPEs [3] are hardware accelerators that share the memory with the general-purposed cores and that are software-programmed by the cores. A plethora of HWPEs have been developed at our group, spanning from machine-learning engines [4] to <br />
accelerators targeting memory-bounded workloads [5]. HWPEs have been historically integrated into the PULP cluster, but a Snitch-based system would greatly benefit by supporting these hardware modules.<br />
<br />
The goal of this project is to implement the necessary architectural modifications to provide support for hardware processing engines (HWPE) in the Snitch cluster. Finally, some of the HWPEs developed at IIS can be used to evaluate the architectural improvements.<br />
<br />
= Project =<br />
<br />
* '''Integrate the support for HWPEs in Snitch'''. This will require you to <br />
** Investigate how the PULP cluster provide support for HWPE, in particular focusing on how the HWPEs connect to the memory and to the cores that program them<br />
** Integrate support for HWPE in Snitch, implementing the necessary modifications<br />
** Verify the functionality of your extensions.<br />
* '''Evaluate your extensions''' by <br />
** Adding one HWPE already developed at IIS to Snitch<br />
** Determining the achieved speed-up for some target applications<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing PULP cluster enhanced by the same HWPE.<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 40% RTL implementation<br />
* 15% Bare-metal C programming<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://ieeexplore.ieee.org/abstract/document/6868645 He-P2012: Architectural heterogeneity exploration on a scalable many-core platform] <br />
<br />
[4] [https://ieeexplore.ieee.org/abstract/document/8412533 XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference]<br />
<br />
[5] [https://ieeexplore.ieee.org/abstract/document/9516654 To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters]<br />
<br />
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Smart_Meters&diff=8023Smart Meters2022-09-07T11:56:03Z<p>Lbertaccini: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Completed]]<br />
[[Category:Bachelor Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:SmartSensors]]<br />
[[Category:EmbeddedAI]]<br />
[[Category:Low Power Embedded Systems and Wireless Sensors Networks]]<br />
[[Category:2022]]<br />
[[Category:Lbertaccini]]<br />
<br />
[[File:smart_meters.png|600px|right|thumb]]<br />
<br />
===Status: Completed ===<br />
<br />
== Description ==<br />
The Internet of Things (IoT) era is characterized by billions of devices gathering data and sending them to servers, where they can be analyzed and processed. A pre-processing step can also be implemented directly on the IoT device to save energy and bandwidth. Extracting information on the edge allows sending a lighter payload to the server, thus reducing the time spent in transmission.<br />
<br />
<br />
The goal of this project is to implement a low-cost solution to make mechanical meters smart, instead of replacing them with costly devices.<br />
The students will work on a Smart Meter, an IoT system based on:<br />
* GAPuino, a development board based on PULP (Parallel Ultra-Low-Power Processing Platform), developed here at IIS. PULP is an open-source multi-core platform achieving leading-edge energy efficiency and featuring widely-tunable performance<br />
* a modem for wireless connectivity<br />
* an ultra-low-power camera<br />
<br />
The system will periodically wake up, take a picture, process the image extracting the number displayed on the meters and transmit the value wirelessly. A wide range of different meters exists and many of them are located in environments with difficult lighting conditions. Therefore, analyzing the image on the edge will require robust pattern recognition algorithms.<br />
<br />
<br />
A prototype connecting GAPuino with the modem and able to send messages to a server has already been implemented. During this project, you will optimize the pipeline for energy efficiency, interface GAPuino with the ultra-low-power camera and build the pattern recognition application.<br />
<br />
===== Application Scenario ===== <br />
The smart meter will be employed in an IoT scenario. The automatic recognition of the number displayed on the meter and its wireless transmission will replace the need for a person to read the meter and annotate the measurement. <br />
<br />
===== Requirements ===== <br />
<br />
* Familiarity with C programming<br />
* Basic knowledge of communication protocols<br />
<br />
===== Task Description =====<br />
* Interfacing PULP with the ultra-low-power camera<br />
* Implementation of a pattern recognition algorithm on a microcontroller<br />
* Testing the system and evaluate the power consumption <br />
* Optimization for energy efficiency<br />
<br />
===== Project Supervisor ===== <br />
* [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* [[:User:scheremo | Moritz Scherer]]: [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch];</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Smart_Meters&diff=8022Smart Meters2022-09-07T11:55:18Z<p>Lbertaccini: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Bachelor Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:SmartSensors]]<br />
[[Category:EmbeddedAI]]<br />
[[Category:Low Power Embedded Systems and Wireless Sensors Networks]]<br />
[[Category:2022]]<br />
[[Category:Lbertaccini]]<br />
<br />
[[File:smart_meters.png|600px|right|thumb]]<br />
<br />
===Status: Completed ===<br />
<br />
== Description ==<br />
The Internet of Things (IoT) era is characterized by billions of devices gathering data and sending them to servers, where they can be analyzed and processed. A pre-processing step can also be implemented directly on the IoT device to save energy and bandwidth. Extracting information on the edge allows sending a lighter payload to the server, thus reducing the time spent in transmission.<br />
<br />
<br />
The goal of this project is to implement a low-cost solution to make mechanical meters smart, instead of replacing them with costly devices.<br />
The students will work on a Smart Meter, an IoT system based on:<br />
* GAPuino, a development board based on PULP (Parallel Ultra-Low-Power Processing Platform), developed here at IIS. PULP is an open-source multi-core platform achieving leading-edge energy efficiency and featuring widely-tunable performance<br />
* a modem for wireless connectivity<br />
* an ultra-low-power camera<br />
<br />
The system will periodically wake up, take a picture, process the image extracting the number displayed on the meters and transmit the value wirelessly. A wide range of different meters exists and many of them are located in environments with difficult lighting conditions. Therefore, analyzing the image on the edge will require robust pattern recognition algorithms.<br />
<br />
<br />
A prototype connecting GAPuino with the modem and able to send messages to a server has already been implemented. During this project, you will optimize the pipeline for energy efficiency, interface GAPuino with the ultra-low-power camera and build the pattern recognition application.<br />
<br />
===== Application Scenario ===== <br />
The smart meter will be employed in an IoT scenario. The automatic recognition of the number displayed on the meter and its wireless transmission will replace the need for a person to read the meter and annotate the measurement. <br />
<br />
===== Requirements ===== <br />
<br />
* Familiarity with C programming<br />
* Basic knowledge of communication protocols<br />
<br />
===== Task Description =====<br />
* Interfacing PULP with the ultra-low-power camera<br />
* Implementation of a pattern recognition algorithm on a microcontroller<br />
* Testing the system and evaluate the power consumption <br />
* Optimization for energy efficiency<br />
<br />
===== Project Supervisor ===== <br />
* [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* [[:User:scheremo | Moritz Scherer]]: [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch];</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Optimizing_the_Pipeline_in_our_Floating_Point_Architectures_(1S)&diff=7907Optimizing the Pipeline in our Floating Point Architectures (1S)2022-08-05T17:29:23Z<p>Lbertaccini: </p>
<hr />
<div><!-- Optimizing the Pipeline in our Floating Point Architectures (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, FPnew was instantiated without a DivSqrt module.]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. FP architectures usually show a large critical path and need to be pipelined to match the system’s operating frequency. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product). Each operation group block (except the DivSqrt module which implements an iterative algorithm) contains a parametrized number of pipeline registers. Currently, all the registers are placed close to the input boundaries, and the timing is optimized during the backend. However, this can make the backend a longer and more complex process. The goal of this project is to manually place the pipeline registers optimizing for timing, and compare them against the baseline implementation.<br />
<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the FPU timing'''. This will require you to <br />
** Understand what are the critical paths in the unit<br />
** How the critical paths are broken when inserting different numbers of pipeline registers<br />
* '''RTL modifications to FPnew''' to manually optimized the pipeline for different numbers of pipeline registers<br />
* '''Implementation of a Python generator''' that takes the number of pipeline levels as an input and places the registers in the position you identified<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 30% RTL implementation<br />
* 40% Evaluation<br />
* 15% Python generator<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/abstract/document/9311229 FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Optimizing_the_Pipeline_in_our_Floating_Point_Architectures_(1S)&diff=7906Optimizing the Pipeline in our Floating Point Architectures (1S)2022-08-05T17:26:45Z<p>Lbertaccini: </p>
<hr />
<div><!-- Optimizing the Pipeline in our Floating Point Architectures (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:Energy Efficient SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
[[File:Fpu_block_diagram.png|thumb|300px|FPnew block diagram [1]. Each operation group block can be instantiated through a parameter. In the figure, FPnew was instantiated without a DivSqrt module.]]<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. FP architectures usually show a large critical path and need to be pipelined to match the system’s operating frequency. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product). Each operation group block (except the DivSqrt module which implements an iterative algorithm) contains a parametrized number of pipeline registers. Currently, all the registers are placed close to the input boundaries, and the timing is optimized during the backend. However, this can make the backend a longer and more complex process. The goal of this project is to manually place the pipeline registers optimizing for timing, and compare them against the baseline implementation.<br />
<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the FPU timing'''. This will require you to <br />
** Understand what are the critical paths in the unit<br />
** How the critical paths are broken when inserting different numbers of pipeline registers<br />
* '''RTL modifications to FPnew''' to manually optimized the pipeline for different numbers of pipeline registers<br />
* '''Implementation of a Python generator''' that takes the number of pipeline levels as an input and places the registers in the position you identified<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 30% RTL implementation<br />
* 40% Evaluation<br />
* 15% Python generator<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/abstract/document/9311229 FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=File:Fpu_block_diagram.png&diff=7905File:Fpu block diagram.png2022-08-05T17:22:36Z<p>Lbertaccini: FPU block diagram (without DivSqrt operation forup block)</p>
<hr />
<div>FPU block diagram (without DivSqrt operation forup block)</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Optimizing_the_Pipeline_in_our_Floating_Point_Architectures_(1S)&diff=7904Optimizing the Pipeline in our Floating Point Architectures (1S)2022-08-05T17:19:30Z<p>Lbertaccini: Created page with "<!-- Optimizing the Pipeline in our Floating Point Architectures (1S) --> Category:Digital Category:Acceleration_and_Transprecision Category:Energy Efficient SoCs..."</p>
<hr />
<div><!-- Optimizing the Pipeline in our Floating Point Architectures (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:Energy Efficient SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
Floating-point (FP) arithmetic is fundamental for a large set of applications spanning from high-performance computing to neural network training. FP architectures usually show a large critical path and need to be pipelined to match the system’s operating frequency. A flexible highly-parametrized open-source floating-point unit (FPU) called FPnew [1,2] has been developed at IIS. <br />
<br />
FPnew is optimized for high-performance and energy efficiency. It is internally organized in modules, each one carrying out one operation group (add/mul, divsqrt, cast, comparisons, dot-product). Each operation group block contains a parametrized number of pipeline registers. Currently, all the registers are placed close to the input boundaries, and the timing is optimized during the backend. However, this can make the backend a longer and more complex process. The goal of this project is to manually place the pipeline registers optimizing for timing, and compare them against the baseline implementation.<br />
<br />
<br />
= Project =<br />
<br />
* '''Investigation of the FPU timing'''. This will require you to <br />
** Understand what are the critical paths in the unit<br />
** How the critical paths are broken when inserting different numbers of pipeline registers<br />
* '''RTL modifications to FPnew''' to manually optimized the pipeline for different numbers of pipeline registers<br />
* '''Implementation of a Python generator''' that takes the number of pipeline levels as an input and places the registers in the position you identified<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 30% RTL implementation<br />
* 40% Evaluation<br />
* 15% Python generator<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/abstract/document/9311229 FPnew: An Open-Source Multiformat Floating-Point Unit Architecture for Energy-Proportional Transprecision Computing<br />
<br />
[2] https://github.com/openhwgroup/cvfpu/<br />
<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Smart_Meters&diff=7513Smart Meters2022-01-26T19:30:50Z<p>Lbertaccini: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Available]]<br />
[[Category:Bachelor Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:SmartSensors]]<br />
[[Category:EmbeddedAI]]<br />
[[Category:Low Power Embedded Systems and Wireless Sensors Networks]]<br />
[[Category:2022]]<br />
[[Category:Lbertaccini]]<br />
<br />
[[File:smart_meters.png|600px|right|thumb]]<br />
<br />
== Description ==<br />
The Internet of Things (IoT) era is characterized by billions of devices gathering data and sending them to servers, where they can be analyzed and processed. A pre-processing step can also be implemented directly on the IoT device to save energy and bandwidth. Extracting information on the edge allows sending a lighter payload to the server, thus reducing the time spent in transmission.<br />
<br />
<br />
The goal of this project is to implement a low-cost solution to make mechanical meters smart, instead of replacing them with costly devices.<br />
The students will work on a Smart Meter, an IoT system based on:<br />
* GAPuino, a development board based on PULP (Parallel Ultra-Low-Power Processing Platform), developed here at IIS. PULP is an open-source multi-core platform achieving leading-edge energy efficiency and featuring widely-tunable performance<br />
* a modem for wireless connectivity<br />
* an ultra-low-power camera<br />
<br />
The system will periodically wake up, take a picture, process the image extracting the number displayed on the meters and transmit the value wirelessly. A wide range of different meters exists and many of them are located in environments with difficult lighting conditions. Therefore, analyzing the image on the edge will require robust pattern recognition algorithms.<br />
<br />
<br />
A prototype connecting GAPuino with the modem and able to send messages to a server has already been implemented. During this project, you will optimize the pipeline for energy efficiency, interface GAPuino with the ultra-low-power camera and build the pattern recognition application.<br />
<br />
===== Application Scenario ===== <br />
The smart meter will be employed in an IoT scenario. The automatic recognition of the number displayed on the meter and its wireless transmission will replace the need for a person to read the meter and annotate the measurement. <br />
<br />
===== Requirements ===== <br />
<br />
* Familiarity with C programming<br />
* Basic knowledge of communication protocols<br />
<br />
===== Task Description =====<br />
* Interfacing PULP with the ultra-low-power camera<br />
* Implementation of a pattern recognition algorithm on a microcontroller<br />
* Testing the system and evaluate the power consumption <br />
* Optimization for energy efficiency<br />
<br />
===== Project Supervisor ===== <br />
* [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* [[:User:scheremo | Moritz Scherer]]: [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch];<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_Hardware_Accelerators_into_Snitch&diff=7194Integrating Hardware Accelerators into Snitch2021-11-19T14:15:20Z<p>Lbertaccini: Blanked the page</p>
<hr />
<div></div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_Hardware_Accelerators_into_Snitch_1S&diff=7193Integrating Hardware Accelerators into Snitch 1S2021-11-19T14:14:35Z<p>Lbertaccini: Created page with "<!-- Integrating Hardware Accelerators into Snitch (1S) --> Category:Digital Category:Acceleration_and_Transprecision Category:High Performance SoCs Category:Co..."</p>
<hr />
<div><!-- Integrating Hardware Accelerators into Snitch (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
[[Category:Prasadar]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
** [[:User:Prasadar | Arpan Suravi Prasad ]]: [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster [1] couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]<br />
<br />
[[File:cluster_hwpe.png|thumb|350px|The ''PULP'' cluster including an HWPE [3]]]<br />
<br />
The Snitch system [1] targets energy-efficient high-performance computing. It is built around the tiny RISC-V Snitch integer core, coupled with a large double-precision floating-point unit (FPU) optimized for high-performance. Additionally, Snitch features two custom instruction-set-architecture (ISA) extensions, stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which allows the system to achieve FPU utilization above 90%. <br />
<br />
With the slowdown of Moore's law, increased functionality and/or lower costs are achieved through domain specialization and heterogeneity. This led to an increased interest in domain-specific accelerators, which provide higher energy efficiency at a lower area cost. <br />
<br />
HWPEs [3] are hardware accelerators that share the memory with the general-purposed cores and that are software-programmed by the cores. A plethora of HWPEs have been developed at our group, spanning from machine-learning engines [4] to <br />
accelerators targeting memory-bounded workloads [5]. HWPEs have been historically integrated into the PULP cluster, but a Snitch-based system would greatly benefit by supporting these hardware modules.<br />
<br />
The goal of this project is to implement the necessary architectural modifications to provide support for hardware processing engines (HWPE) in the Snitch cluster. Finally, some of the HWPEs developed at IIS can be used to evaluate the architectural improvements.<br />
<br />
= Project =<br />
<br />
* '''Integrate the support for HWPEs in Snitch'''. This will require you to <br />
** Investigate how the PULP cluster provide support for HWPE, in particular focusing on how the HWPEs connect to the memory and to the cores that program them<br />
** Integrate support for HWPE in Snitch, implementing the necessary modifications<br />
** Verify the functionality of your extensions.<br />
* '''Evaluate your extensions''' by <br />
** Adding one HWPE already developed at IIS to Snitch<br />
** Determining the achieved speed-up for some target applications<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing PULP cluster enhanced by the same HWPE.<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 40% RTL implementation<br />
* 15% Bare-metal C programming<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://ieeexplore.ieee.org/abstract/document/6868645 He-P2012: Architectural heterogeneity exploration on a scalable many-core platform] <br />
<br />
[4] [https://ieeexplore.ieee.org/abstract/document/8412533 XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference]<br />
<br />
[5] [https://ieeexplore.ieee.org/abstract/document/9516654 To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters]<br />
<br />
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_Hardware_Accelerators_into_Snitch&diff=7192Integrating Hardware Accelerators into Snitch2021-11-19T13:48:55Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating Hardware Accelerators into Snitch (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
[[Category:Prasadar]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
** [[:User:Prasadar | Arpan Suravi Prasad ]]: [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster [1] couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]<br />
<br />
[[File:cluster_hwpe.png|thumb|350px|The ''PULP'' cluster including an HWPE [3]]]<br />
<br />
The Snitch system [1] targets energy-efficient high-performance computing. It is built around the tiny RISC-V Snitch integer core, coupled with a large double-precision floating-point unit (FPU) optimized for high-performance. Additionally, Snitch features two custom instruction-set-architecture (ISA) extensions, stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which allows the system to achieve FPU utilization above 90%. <br />
<br />
With the slowdown of Moore's law, increased functionality and/or lower costs are achieved through domain specialization and heterogeneity. This led to an increased interest in domain-specific accelerators, which provide higher energy efficiency at a lower area cost. <br />
<br />
HWPEs [3] are hardware accelerators that share the memory with the general-purposed cores and that are software-programmed by the cores. A plethora of HWPEs have been developed at our group, spanning from machine-learning engines [4] to <br />
accelerators targeting memory-bounded workloads [5]. HWPEs have been historically integrated into the PULP cluster, but a Snitch-based system would greatly benefit by supporting these hardware modules.<br />
<br />
The goal of this project is to implement the necessary architectural modifications to provide support for hardware processing engines (HWPE) in the Snitch cluster. Finally, some of the HWPEs developed at IIS can be used to evaluate the architectural improvements.<br />
<br />
= Project =<br />
<br />
* '''Integrate the support for HWPEs in Snitch'''. This will require you to <br />
** Investigate how the PULP cluster provide support for HWPE, in particular focusing on how the HWPEs connect to the memory and to the cores that program them<br />
** Integrate support for HWPE in Snitch, implementing the necessary modifications<br />
** Verify the functionality of your extensions.<br />
* '''Evaluate your extensions''' by <br />
** Adding one HWPE already developed at IIS to Snitch<br />
** Determining the achieved speed-up for some target applications<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing PULP cluster enhanced by the same HWPE.<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 40% RTL implementation<br />
* 15% Bare-metal C programming<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://ieeexplore.ieee.org/abstract/document/6868645 He-P2012: Architectural heterogeneity exploration on a scalable many-core platform] <br />
<br />
[4] [https://ieeexplore.ieee.org/abstract/document/8412533 XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference]<br />
<br />
[5] [https://ieeexplore.ieee.org/abstract/document/9516654 To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters]<br />
<br />
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_Hardware_Accelerators_into_Snitch&diff=7191Integrating Hardware Accelerators into Snitch2021-11-19T11:45:15Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating Hardware Accelerators into Snitch (2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
[[Category:Prasadar]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
** [[:User:Prasadar | Arpan Suravi Prasad ]]: [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster [1] couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]<br />
<br />
[[File:cluster_hwpe.png|thumb|350px|The ''PULP'' cluster including an HWPE [3]]]<br />
<br />
The Snitch system [1] targets energy-efficient high-performance computing. It is built around the tiny RISC-V Snitch integer core, coupled with a large double-precision floating-point unit (FPU) optimized for high-performance. Additionally, Snitch features two custom instruction-set-architecture (ISA) extensions, stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which allows the system to achieve FPU utilization above 90%. <br />
<br />
With the slowdown of Moore's law, increased functionality and/or lower costs are achieved through domain specialization and heterogeneity. This led to an increased interest in domain-specific accelerators, which provide higher energy efficiency at a lower area cost. <br />
<br />
HWPEs [3] are hardware accelerators that share the memory with the general-purposed cores and that are software-programmed by the cores. A plethora of HWPEs have been developed at our group, spanning from machine-learning engines [4] to <br />
accelerators targeting memory-bounded workloads [5]. HWPEs have been historically integrated into the PULP cluster, but a Snitch-based system would greatly benefit by supporting these hardware modules.<br />
<br />
The goal of this project is to implement the necessary architectural modifications to provide support for hardware processing engines (HWPE) in the Snitch cluster. Finally, some of the HWPEs developed at IIS can be used to evaluate the architectural improvements.<br />
<br />
= Project =<br />
<br />
* '''Integrate the support for HWPEs in Snitch'''. This will require you to <br />
** Investigate how the PULP cluster provide support for HWPE, in particular focusing on how the HWPEs connect to the memory and to the cores that program them<br />
** Integrate support for HWPE in Snitch, implementing the necessary modifications<br />
** Verify the functionality of your extensions.<br />
* '''Evaluate your extensions''' by <br />
** Adding one HWPE already developed at IIS to Snitch<br />
** Determining the achieved speed-up for some target applications<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing PULP cluster enhanced by the same HWPE.<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 40% RTL implementation<br />
* 15% Bare-metal C programming<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://ieeexplore.ieee.org/abstract/document/6868645 He-P2012: Architectural heterogeneity exploration on a scalable many-core platform] <br />
<br />
[4] [https://ieeexplore.ieee.org/abstract/document/8412533 XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference]<br />
<br />
[5] [https://ieeexplore.ieee.org/abstract/document/9516654 To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters]<br />
<br />
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]<br />
<br />
===Status: Available ===</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_Hardware_Accelerators_into_Snitch&diff=7190Integrating Hardware Accelerators into Snitch2021-11-19T11:44:24Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating Hardware Accelerators into Snitch (2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
[[Category:Prasadar]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
** [[:User:Prasadar | Arpan Suravi Prasad ]]: [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster [1] couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]<br />
<br />
[[File:cluster_hwpe.png|thumb|350px|The ''PULP'' cluster including an HWPE [3]]]<br />
<br />
The Snitch system [1] targets energy-efficient high-performance computing. It is built around the tiny RISC-V Snitch integer core, coupled with a large double-precision floating-point unit (FPU) optimized for high-performance. Additionally, Snitch features two custom instruction-set-architecture (ISA) extensions, stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which allows the system to achieve FPU utilization above 90%. <br />
<br />
With the slowdown of Moore's law, increased functionality and/or lower costs are achieved through domain specialization and heterogeneity. This led to an increased interest in domain-specific accelerators, which provide higher energy efficiency at a lower area cost. <br />
<br />
HWPEs [3] are hardware accelerators that share the memory with the general-purposed cores and that are software-programmed by the cores. A plethora of HWPEs have been developed at our group, spanning from machine-learning engines [4] to <br />
accelerators targeting memory-bounded workloads [5]. HWPEs have been historically integrated into the PULP cluster, but a Snitch-based system would greatly benefit by supporting these hardware modules.<br />
<br />
The goal of this project is to implement the necessary architectural modifications to provide support for hardware processing engines (HWPE) in the Snitch cluster. Finally, some of the HWPEs developed at IIS can be used to evaluate the architectural improvements.<br />
<br />
= Project =<br />
<br />
* '''Integrate the support for HWPEs in Snitch'''. This will require you to <br />
** Investigate how the PULP cluster provide support for HWPE, in particular focusing on how the HWPEs connect to the memory and to the cores that program them<br />
** Integrate support for HWPE in Snitch, implementing the necessary modifications<br />
** Verify the functionality of your extensions.<br />
* '''Evaluate your extensions''' by <br />
** Adding one HWPE already developed at IIS to Snitch<br />
** Determining the achieved speed-up for some target applications<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing PULP cluster enhanced by the same HWPE.<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 40% RTL implementation<br />
* 15% Bare-metal C programming<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://ieeexplore.ieee.org/abstract/document/6868645 He-P2012: Architectural heterogeneity exploration on a scalable many-core platform] <br />
<br />
[4] [https://ieeexplore.ieee.org/abstract/document/8412533 XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference]<br />
<br />
[5] [https://ieeexplore.ieee.org/abstract/document/9516654 To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters]<br />
<br />
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=File:Cluster_hwpe.png&diff=7189File:Cluster hwpe.png2021-11-19T11:42:27Z<p>Lbertaccini: </p>
<hr />
<div></div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_Hardware_Accelerators_into_Snitch&diff=7188Integrating Hardware Accelerators into Snitch2021-11-19T11:40:26Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating Hardware Accelerators into Snitch (2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
[[Category:Prasadar]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
** [[:User:Prasadar | Arpan Suravi Prasad ]]: [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
<br />
<br />
= Introduction =<br />
<br />
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster [1] couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]<br />
<br />
The Snitch system [1] targets energy-efficient high-performance computing. It is built around the tiny RISC-V Snitch integer core, coupled with a large double-precision floating-point unit (FPU) optimized for high-performance. Additionally, Snitch features two custom instruction-set-architecture (ISA) extensions, stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which allows the system to achieve FPU utilization above 90%. <br />
<br />
With the slowdown of Moore's law, increased functionality and/or lower costs are achieved through domain specialization and heterogeneity. This led to an increased interest in domain-specific accelerators, which provide higher energy efficiency at a lower area cost. <br />
<br />
HWPEs [3] are hardware accelerators that share the memory with the general-purposed cores and that are software-programmed by the cores. A plethora of HWPEs have been developed at our group, spanning from machine-learning engines [4] to <br />
accelerators targeting memory-bounded workloads [5]. HWPEs have been historically integrated into the PULP cluster, but a Snitch-based system would greatly benefit by supporting these hardware modules.<br />
<br />
The goal of this project is to implement the necessary architectural modifications to provide support for hardware processing engines (HWPE) in the Snitch cluster. Finally, some of the HWPEs developed at IIS can be used to evaluate the architectural improvements.<br />
<br />
= Project =<br />
<br />
* '''Integrate the support for HWPEs in Snitch'''. This will require you to <br />
** Investigate how the PULP cluster provide support for HWPE, in particular focusing on how the HWPEs connect to the memory and to the cores that program them<br />
** Integrate support for HWPE in Snitch, implementing the necessary modifications<br />
** Verify the functionality of your extensions.<br />
* '''Evaluate your extensions''' by <br />
** Adding one HWPE already developed at IIS to Snitch<br />
** Determining the achieved speed-up for some target applications<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing PULP cluster enhanced by the same HWPE.<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 40% RTL implementation<br />
* 15% Bare-metal C programming<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://ieeexplore.ieee.org/abstract/document/6868645 He-P2012: Architectural heterogeneity exploration on a scalable many-core platform] <br />
<br />
[4] [https://ieeexplore.ieee.org/abstract/document/8412533 XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference]<br />
<br />
[5] [https://ieeexplore.ieee.org/abstract/document/9516654 To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters]<br />
<br />
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_Hardware_Accelerators_into_Snitch&diff=7187Integrating Hardware Accelerators into Snitch2021-11-19T11:37:54Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating Hardware Accelerators into Snitch (2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Lbertaccini]]<br />
[[Category:Prasadar]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
** [[:User:Prasadar | Arpan Suravi Prasad ]]: [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
<br />
<br />
= Introduction =<br />
<br />
The Snitch system [1] targets energy-efficient high-performance computing. It is built around the tiny RISC-V Snitch integer core, coupled with a large double-precision floating-point unit (FPU) optimized for high-performance. Additionally, Snitch features two custom instruction-set-architecture (ISA) extensions, stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which allows the system to achieve FPU utilization above 90%. <br />
<br />
With the slowdown of Moore's law, increased functionality and/or lower costs are achieved through domain specialization and heterogeneity. This led to an increased interest in domain-specific accelerators, which provide higher energy efficiency at a lower area cost. <br />
<br />
HWPEs [3] are hardware accelerators that share the memory with the general-purposed cores and that are software-programmed by the cores. A plethora of HWPEs have been developed at our group, spanning from machine-learning engines [4] to <br />
accelerators targeting memory-bounded workloads [5]. HWPEs have been historically integrated into the PULP cluster, but a Snitch-based system would greatly benefit by supporting these hardware modules.<br />
<br />
The goal of this project is to implement the necessary architectural modifications to provide support for hardware processing engines (HWPE) in the Snitch cluster. Finally, some of the HWPEs developed at IIS can be used to evaluate the architectural improvements.<br />
<br />
= Project =<br />
<br />
* '''Integrate the support for HWPEs in Snitch'''. This will require you to <br />
** Investigate how the PULP cluster provide support for HWPE, in particular focusing on how the HWPEs connect to the memory and to the cores that program them<br />
** Integrate support for HWPE in Snitch, implementing the necessary modifications<br />
** Verify the functionality of your extensions.<br />
* '''Evaluate your extensions''' by <br />
** Adding one HWPE already developed at IIS to Snitch<br />
** Determining the achieved speed-up for some target applications<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing PULP cluster enhanced by the same HWPE.<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 40% RTL implementation<br />
* 15% Bare-metal C programming<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://ieeexplore.ieee.org/abstract/document/6868645 He-P2012: Architectural heterogeneity exploration on a scalable many-core platform] <br />
<br />
[4] [https://ieeexplore.ieee.org/abstract/document/8412533 XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference]<br />
<br />
[5] [https://ieeexplore.ieee.org/abstract/document/9516654 To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters]<br />
<br />
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]</div>Lbertaccinihttp://iis-projects.ee.ethz.ch/index.php?title=Integrating_Hardware_Accelerators_into_Snitch&diff=7186Integrating Hardware Accelerators into Snitch2021-11-19T11:36:43Z<p>Lbertaccini: </p>
<hr />
<div><!-- Integrating Hardware Accelerators into Snitch (2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Lbertaccini]]<br />
[[Category:Prasadar]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Lbertaccini | Luca Bertaccini]]: [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
** [[:User:Prasadar | Arpan Suravi Prasad ]]: [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
<br />
<br />
= Introduction =<br />
<br />
The Snitch system [1] targets energy-efficient high-performance computing. It is built around the tiny RISC-V Snitch integer core, coupled with a large double-precision floating-point unit (FPU) optimized for high-performance. Additionally, Snitch features two custom instruction-set-architecture (ISA) extensions, stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which allows the system to achieve FPU utilization above 90%. <br />
<br />
With the slowdown of Moore's law, increased functionality and/or lower costs are achieved through domain specialization and heterogeneity. This led to an increased interest in domain-specific accelerators, which provide higher energy efficiency at a lower area cost. <br />
<br />
HWPEs [3] are hardware accelerators that share the memory with the general-purposed cores and that are software-programmed by the cores. A plethora of HWPEs have been developed at our group, spanning from machine-learning engines [4] to <br />
accelerators targeting memory-bounded workloads [5]. HWPEs have been historically integrated into the PULP cluster, but a Snitch-based system would greatly benefit by supporting these hardware modules.<br />
<br />
The goal of this project is to implement the necessary architectural modifications to provide support for hardware processing engines (HWPE) in the Snitch cluster. Finally, some of the HWPEs developed at IIS can be used to evaluate the architectural improvements.<br />
<br />
= Project =<br />
<br />
* '''Integrate the support for HWPEs in Snitch'''. This will require you to <br />
** Investigate how the PULP cluster provide support for HWPE, in particular focusing on how the HWPEs connect to the memory and to the cores that program them<br />
** Integrate support for HWPE in Snitch, implementing the necessary modifications<br />
** Verify the functionality of your extensions.<br />
* '''Evaluate your extensions''' by <br />
** Adding one HWPE already developed at IIS to Snitch<br />
** Determining the achieved speed-up for some target applications<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing PULP cluster enhanced by the same HWPE.<br />
<br />
== Character ==<br />
<br />
* 15% Literature / architecture review<br />
* 40% RTL implementation<br />
* 15% Bare-metal C programming<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://ieeexplore.ieee.org/abstract/document/6868645 He-P2012: Architectural heterogeneity exploration on a scalable many-core platform] <br />
<br />
[4] [https://ieeexplore.ieee.org/abstract/document/8412533 XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference]<br />
<br />
[5] [https://ieeexplore.ieee.org/abstract/document/9516654 To Buffer, or Not to Buffer? A Case Study on FFT Accelerators for Ultra-Low-Power Multicore Clusters]<br />
<br />
[6] [https://ieeexplore.ieee.org/abstract/document/7927716 An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics]</div>Lbertaccini