Personal tools

Difference between revisions of "Approximate Matrix Multiplication based Hardware Accelerator to achieve the next 10x in Energy Efficiency: Training Strategy And Algorithmic optimizations"

From iis-projects

Jump to: navigation, search
(Created page with "<!-- Approximate Matrix Multiplication based Hardware Accelerator to achieve the next 10x in Energy Efficiency: Full System Integration (2S,1M) --> Category:Digital Cat...")
 
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
<!-- Approximate Matrix Multiplication based Hardware Accelerator to achieve the next 10x in Energy Efficiency: Full System Integration (2S,1M) -->
 
<!-- Approximate Matrix Multiplication based Hardware Accelerator to achieve the next 10x in Energy Efficiency: Full System Integration (2S,1M) -->
  
[[Category:Digital]]
+
[[Category:Acceleration and Transprecision]]
[[Category:Acceleration_and_Transprecision]]
 
 
[[Category:High Performance SoCs]]
 
[[Category:High Performance SoCs]]
 
[[Category:Computer Architecture]]
 
[[Category:Computer Architecture]]
 
[[Category:Deep Learning Projects]]
 
[[Category:Deep Learning Projects]]
 
 
[[Category:2022]]
 
[[Category:2022]]
 
[[Category:Master Thesis]]
 
[[Category:Master Thesis]]
[[Category:Available]]
+
<!-- [[Category:Available]] -->
 +
[[Category:Hot]]
 +
[[Category:janniss]]
 +
[[Category:Discuss]]
 +
[[User:janniss]]
  
  
 
= Overview =
 
= Overview =
  
== Status: Available ==
+
== PDF of project ==
 +
[https://iis-people.ee.ethz.ch/~janniss/projects/Maddness_algorithm_retraining.pdf PDF]
 +
 
 +
== Status: Reach out to me ==
  
 
* Type: Master Thesis (1 student)
 
* Type: Master Thesis (1 student)
 
* Professor: Prof. Dr. L. Benini
 
* Professor: Prof. Dr. L. Benini
 
* Supervisors:
 
* Supervisors:
** Jannis Schönleber: [mailto:janniss@ethz.ch janniss@ethz.ch]
+
** Jannis Schönleber: [mailto:janniss@iis.ee.ethz.ch janniss@iis.ee.ethz.ch]
** Lukas Cavigelli (Huawei), [mailto:lukas.cavigelli@huawei.com lukas.cavigelli@huawei.com]
+
* External Advisors:
** Renzo Andri (Huawei), [mailto:renzo.andri@huawei.com renzo.andri@huawei.com]
+
** Lukas Cavigelli (Huawei Research Zurich)
 +
** Renzo Andri (Huawei Research Zurich)
  
 
= Introduction =
 
= Introduction =
 
[[File:maddness_floorplan.png|thumb|350px|Figure 1: Clock layout of the MADDness accelerator using ASAP7 technology]]
 
[[File:maddness_floorplan.png|thumb|350px|Figure 1: Clock layout of the MADDness accelerator using ASAP7 technology]]
The continued growth in DNN model parameter count, application domains and general adoption led to an explosion of the needed computing power and energy. Especially the energy needs have become large enough to be economically unviable or extremely difficult to cool down. That led to a push for more energy-efficient solutions. Energy efficient accelerator solutions have a long tradition in IIS, with a multitude of proven accelerators published in the past. Standard accelerator architectures try to increase throughput via higher memory bandwidth, improved memory hierarchy or reduced precision (FP16, INT8, INT4). The approach of the accelerator used in the project is a different one. It uses an approximate matrix multiplication (AMM) algorithm called MADDness, which replaces the matrix multiplication with a lookup into a look-up-table (LUT) and an addition. That can significantly reduce the overall computing and energy needs.
+
The continued growth in DNN model parameter count, application domains, and general adoption led to an explosion of the needed computing power and energy. Especially the energy needs have become large enough to be economically unviable or extremely difficult to cool down. That led to a push for more energy-efficient solutions. Energy-efficient accelerator solutions have a long tradition in IIS, with many proven accelerators published in the past. Standard accelerator architectures try to increase throughput via higher memory bandwidth, improved memory hierarchy, or reduced precision (FP16, INT8, INT4). The approach of the accelerator used in the project is a different one. It uses an approximate matrix multiplication (AMM) algorithm called MADDness, which replaces the matrix multiplication with a lookup into a look-up-table (LUT) and an addition. That can significantly reduce the overall computing and energy needs.
  
 
= Project Details =
 
= Project Details =
  
The MADDness algorithm is split into two parts. We have an encoding part, which translates the input matrix A into the addresses of the LUT. After the translation follows a decoding part which adds the corresponding LUT values together to calculate the approximate output of the matrix multiplication. MADDness is then integrated into deep neural networks. The most common seen layers in DNNs are convolutional layers and linear layers, both can be replaced by MADDness. Fully tested drop-in PyTorch layers have already been developed and used. Currently, only a single layer replacement analysis has been done rigorously. So far the layers have only been replaced with the MADDness algorithm and the network has not been retrained with the corresponding new outputs of the layers.  
+
The MADDness algorithm is split into two parts. First, we have an encoding part, which translates the input matrix A into the addresses of the LUT. After the translation follows a decoding part that adds the corresponding LUT values together to calculate the approximate output of the matrix multiplication. MADDness is then integrated into deep neural networks. The most commonly seen layers in DNNs are convolutional layers and linear layers, MADDness can replace both. Thoroughly tested drop-in PyTorch layers have already been developed and used. However, currently, only a single-layer replacement analysis has been done rigorously. So far, the layers have only been replaced with the MADDness algorithm, and the network has not been retrained with the corresponding new outputs of the layers.  
 
Energy estimates with the current implementation using GF 22nm FDX technology suggest an energy efficiency of up to 32 TMACs/W compared to a state-of-the-art datacenter NVIDIA A100 (TSMC 7nm FinFET) at around 0.7 TMACs/W (FP16).
 
Energy estimates with the current implementation using GF 22nm FDX technology suggest an energy efficiency of up to 32 TMACs/W compared to a state-of-the-art datacenter NVIDIA A100 (TSMC 7nm FinFET) at around 0.7 TMACs/W (FP16).
In this project, we would like to investigate if we can improve our accelerator’s accuracy by implementing a retraining strategy and framework. The goal would be to be able to replace multiple layers of a DNN without a significant drop in accuracy. A new realm of possible inter-layer optimization can then be analyzed afterwards. For example: Only calculating the needed dimensions for the next MADDness layer or including the activation layer into the MADDness algorithm.  
+
In this project, we would like to investigate if we can improve our accelerator’s accuracy by implementing a retraining strategy and framework. The goal would be to replace multiple layers of a DNN without a significant drop in accuracy. A new realm of possible interlayer optimization can then be analyzed afterward. For example: Only calculating the needed dimensions for the next MADDness layer or including the activation layer into the MADDness algorithm.  
 
More information can be found here:
 
More information can be found here:
Code: https://github.com/joennlae/halutmatmul
+
* Code: https://github.com/joennlae/halutmatmul
Reference Paper: https://arxiv.org/abs/2106.10860
+
* Reference Paper: https://arxiv.org/abs/2106.10860
HN discussion: https://news.ycombinator.com/item?id=28375096
+
* HN discussion: https://news.ycombinator.com/item?id=28375096
and please do not hesitate to reach out to me: janniss@ethz.ch
+
* and please do not hesitate to reach out to me: [mailto:janniss@iis.ee.ethz.ch janniss@iis.ee.ethz.ch]
 
 
  
 
= Project Plan =
 
= Project Plan =
1. Acquire background knowledge & familiarize with the project 3 weeks
+
1. Acquire background knowledge & familiarize yourself with the project (3 weeks)
 
* Read up on the MADDness algorithm and product quantization methods
 
* Read up on the MADDness algorithm and product quantization methods
 
* Familiarize yourself with the current state of the project
 
* Familiarize yourself with the current state of the project
* Familiarize with the IIS compute environment
+
* Familiarize with the IIS computing environment
2. Setup the project & rerun single layer analysis 2 weeks
+
2. Setup the project & rerun single layer analysis (2 weeks)
* Setup the project and rerun a single layer analysis (for example for ResNet-50)
+
* Setup the project and rerun a single layer analysis (for example, for ResNet-50)
* Update the single layer analysis with a larger than previously used LeViT model
+
* Update the single-layer analysis with a larger than previously used LeViT model
3. Set up and evaluate a first retraining pipeline 8 weeks
+
3. Set up and evaluate a first retraining pipeline (8 weeks)
* Using the simple method of replacing one layer with MADDness and then retrain the following layers. After that we freeze that layer and proceed with the next one.
+
* Using the simple method of replacing one layer with MADDness and then retrain the following layers. After that, we freeze that layer and proceed with the next one.
* Evaluate and optimize the pipeline including a detailed analysis of the accuracy development for the ResNet-50, LeViT and DS-CNN networks
+
* Evaluate and optimize the pipeline, including a detailed analysis of the accuracy development for the ResNet-50, LeViT, and DS-CNN networks
 
* Include the developed framework into the already developed learning framework
 
* Include the developed framework into the already developed learning framework
4. Extend the MADDness algorithm with intra-layer optimizations 10 weeks
+
4. Extend the MADDness algorithm with intra-layer optimizations (10 weeks)
 
* Include the activation function into the MADDness algorithm
 
* Include the activation function into the MADDness algorithm
* Can we optimize memory bandwidth and/or compute by only computing the dimensions needed for the following MADDness layer.
+
* Can we optimize memory bandwidth and/or compute by only computing the dimensions needed for the following MADDness layer?
 
* Is the encoding function that we are using the most accurate? Can we improve it?
 
* Is the encoding function that we are using the most accurate? Can we improve it?
5. Project finalization 3 weeks
+
5. Project finalization (3 weeks)
* Prepare final report
+
* Prepare a final report
 
* Prepare project presentation
 
* Prepare project presentation
 
* Clean up code
 
* Clean up code
Line 63: Line 68:
 
== Character ==
 
== Character ==
  
* 20% Literature / project review
+
* 20% Literature/project review
 
* 40% Retraining pipeline implementation in Python
 
* 40% Retraining pipeline implementation in Python
* 30% Algorithmic optimisations
+
* 30% Algorithmic optimizations
 
* 10% Detailed analysis and preparation of results
 
* 10% Detailed analysis and preparation of results
  
Line 71: Line 76:
  
 
* Strong interest in Deep Learning and Hardware accelerators
 
* Strong interest in Deep Learning and Hardware accelerators
* Experience with Python and preferable with PyTorch or a similar machine learning framework (e.g. TensorFlow)
+
* Experience with Python and preferably with PyTorch or a similar machine learning framework (e.g. TensorFlow)
  
  
If you want to work on this project, but you think that you do not match some of the required skills, please get in touch with us and we can provide preliminary exercises to help you fill in the gap.
+
If you want to work on this project but think you do not match some of the required skills, please contact us, and we can provide preliminary exercises to help you fill in the gap.
  
  
===Status: Available ===
+
===Status: Reach out to me ===

Latest revision as of 15:32, 3 November 2022

User:janniss


Overview

PDF of project

PDF

Status: Reach out to me

  • Type: Master Thesis (1 student)
  • Professor: Prof. Dr. L. Benini
  • Supervisors:
  • External Advisors:
    • Lukas Cavigelli (Huawei Research Zurich)
    • Renzo Andri (Huawei Research Zurich)

Introduction

Figure 1: Clock layout of the MADDness accelerator using ASAP7 technology

The continued growth in DNN model parameter count, application domains, and general adoption led to an explosion of the needed computing power and energy. Especially the energy needs have become large enough to be economically unviable or extremely difficult to cool down. That led to a push for more energy-efficient solutions. Energy-efficient accelerator solutions have a long tradition in IIS, with many proven accelerators published in the past. Standard accelerator architectures try to increase throughput via higher memory bandwidth, improved memory hierarchy, or reduced precision (FP16, INT8, INT4). The approach of the accelerator used in the project is a different one. It uses an approximate matrix multiplication (AMM) algorithm called MADDness, which replaces the matrix multiplication with a lookup into a look-up-table (LUT) and an addition. That can significantly reduce the overall computing and energy needs.

Project Details

The MADDness algorithm is split into two parts. First, we have an encoding part, which translates the input matrix A into the addresses of the LUT. After the translation follows a decoding part that adds the corresponding LUT values together to calculate the approximate output of the matrix multiplication. MADDness is then integrated into deep neural networks. The most commonly seen layers in DNNs are convolutional layers and linear layers, MADDness can replace both. Thoroughly tested drop-in PyTorch layers have already been developed and used. However, currently, only a single-layer replacement analysis has been done rigorously. So far, the layers have only been replaced with the MADDness algorithm, and the network has not been retrained with the corresponding new outputs of the layers. Energy estimates with the current implementation using GF 22nm FDX technology suggest an energy efficiency of up to 32 TMACs/W compared to a state-of-the-art datacenter NVIDIA A100 (TSMC 7nm FinFET) at around 0.7 TMACs/W (FP16). In this project, we would like to investigate if we can improve our accelerator’s accuracy by implementing a retraining strategy and framework. The goal would be to replace multiple layers of a DNN without a significant drop in accuracy. A new realm of possible interlayer optimization can then be analyzed afterward. For example: Only calculating the needed dimensions for the next MADDness layer or including the activation layer into the MADDness algorithm. More information can be found here:

Project Plan

1. Acquire background knowledge & familiarize yourself with the project (3 weeks)

  • Read up on the MADDness algorithm and product quantization methods
  • Familiarize yourself with the current state of the project
  • Familiarize with the IIS computing environment

2. Setup the project & rerun single layer analysis (2 weeks)

  • Setup the project and rerun a single layer analysis (for example, for ResNet-50)
  • Update the single-layer analysis with a larger than previously used LeViT model

3. Set up and evaluate a first retraining pipeline (8 weeks)

  • Using the simple method of replacing one layer with MADDness and then retrain the following layers. After that, we freeze that layer and proceed with the next one.
  • Evaluate and optimize the pipeline, including a detailed analysis of the accuracy development for the ResNet-50, LeViT, and DS-CNN networks
  • Include the developed framework into the already developed learning framework

4. Extend the MADDness algorithm with intra-layer optimizations (10 weeks)

  • Include the activation function into the MADDness algorithm
  • Can we optimize memory bandwidth and/or compute by only computing the dimensions needed for the following MADDness layer?
  • Is the encoding function that we are using the most accurate? Can we improve it?

5. Project finalization (3 weeks)

  • Prepare a final report
  • Prepare project presentation
  • Clean up code


Character

  • 20% Literature/project review
  • 40% Retraining pipeline implementation in Python
  • 30% Algorithmic optimizations
  • 10% Detailed analysis and preparation of results

Prerequisites

  • Strong interest in Deep Learning and Hardware accelerators
  • Experience with Python and preferably with PyTorch or a similar machine learning framework (e.g. TensorFlow)


If you want to work on this project but think you do not match some of the required skills, please contact us, and we can provide preliminary exercises to help you fill in the gap.


Status: Reach out to me