http://iis-projects.ee.ethz.ch/api.php?action=feedcontributions&user=Fischeti&feedformat=atomiis-projects - User contributions [en]2024-03-29T11:17:27ZUser contributionsMediaWiki 1.28.0http://iis-projects.ee.ethz.ch/index.php?title=Modeling_FlooNoC_in_GVSoC_(S/M)&diff=9953Modeling FlooNoC in GVSoC (S/M)2023-12-04T13:15:52Z<p>Fischeti: Created page with "<!-- Network-on-Chip for coherent and non-coherent traffic (M) --> Category:Digital Category:Network-on-Chip Category:Interconnect Category:GVSoC Category:S..."</p>
<hr />
<div><!-- Network-on-Chip for coherent and non-coherent traffic (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Network-on-Chip]]<br />
[[Category:Interconnect]]<br />
[[Category:GVSoC]]<br />
[[Category:Simulation]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Prasadar]]<br />
[[Category:Available]]<br />
<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User:Prasadar | Arpan Prasad]]: [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
==GVSoC==<br />
<br />
GVSoC is a sophisticated, highly configurable, and timing-accurate event-driven simulator designed for simulating IoT processors and complex system-on-chips (SoCs). It plays a crucial role in enabling agile design space exploration for low-power SoCs, particularly useful in scenarios demanding rapid and accurate simulation of heterogeneous systems combining microcontroller units (MCUs) with application-specific accelerators. GVSoC achieves this by combining the efficiency of C++ models with the flexibility of Python configuration scripts, allowing for the simulation of full-platform systems including multicore, multi-memory levels, and various I/O peripherals. Remarkably, GVSoC delivers simulation speeds up to 2500 times faster than cycle-accurate simulators, maintaining errors typically below 10% for performance analysis. This makes it an invaluable tool in breaking the speed and design effort bottlenecks of traditional simulators and FPGA prototypes, while still preserving functional and timing accuracy. [1][2]<br />
<br />
==FlooNoC==<br />
<br />
FlooNoC is a modern, open-source Network-on-Chip (NoC) architecture designed in our group [3][4], distinguished for its low-latency, full AXI4 compatibility, and wide physical channels. Designed to address the high-bandwidth requirements of contemporary applications, FlooNoC stands out for its efficient handling of both high-bandwidth, burst-based traffic and latency-critical short messages. The architecture's key elements include scalable and low-complexity routers, wide channels for high bandwidth throughput, and a decoupled link-level protocol for enhanced scalability. It is particularly notable for demonstrating high energy efficiency and a modest area footprint, making it an ideal candidate for integration into advanced SoC designs.<br />
<br />
= Project =<br />
<br />
This project seeks to integrate the innovative FlooNoC architecture into the GVSoC simulation framework. The integration aims to leverage GVSoC’s fast, accurate, and highly configurable simulation capabilities to enable fast design space exploration (DSE) compared to slow and cumbersome RTL simulations. This will enhance GVSoC's utility in simulating advanced NoC architectures and contribute to the development of more efficient and scalable SoC designs.<br />
<br />
The goals of this projects are the following:<br />
<br />
1. '''Research:''' Getting familiar with the implementation of GVSoC and how understand the architecture of FlooNoC in order to know how you need to model it in GVSoC. Identify key integration points and challenges.<br />
<br />
2. '''Modeling:''' Develop a detailed model of FlooNoC within GVSoC, with a good tradeoff of performance and accuracy.<br />
<br />
3. '''Generation:''' Extend GVSoC with the capability to quickly generate different network topologies to enable quick design space exploration<br />
<br />
4. '''Analysis:''' Evaluate GVSoC and your model of FlooNoC in terms of performance, accuracy, and usability/<br />
<br />
== Character ==<br />
<br />
* 20% Literature Research<br />
* 50% Modelling<br />
* 20% Analysis<br />
* 10% Documentation & Report<br />
<br />
== Prerequisites ==<br />
<br />
* Knowledge in C++ and Python is recommended for developing GVSoC<br />
* Experience with System Verilog is recommended but not strictly necessary<br />
<br />
== References ==<br />
* [1] https://ieeexplore.ieee.org/abstract/document/9643828<br />
* [2] https://gvsoc.readthedocs.io/en/latest/<br />
* [3] https://ieeexplore.ieee.org/document/10225380<br />
* [4] https://github.com/pulp-platform/FlooNoC/tree/main</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Acceleration_and_Transprecision&diff=9949Acceleration and Transprecision2023-11-24T15:05:10Z<p>Fischeti: </p>
<hr />
<div>[[File:NVIDIA Tesla V100.jpg|thumb|right|A NVIDIA Tesla V100 GP-GPU. This cutting-edge accelerator provides huge computational power on a [https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/ massive 800 mm² die].]]<br />
[[File:Google Cloud TPU.png|thumb|right|Google's Cloud TPU (Tensor Processing Unit). This machine learning accelerator can do one thing extremely well: multiply-accumulate operations.]]<br />
<br />
Accelerators are the backbone of big data and scientific computing. While general purpose processor architectures such as Intel's x86 provide good performance across a wide variety of applications, it is only since the advent of general purpose GPUs that many computationally demanding tasks have become feasible. Since these GPUs support a much narrower set of operations, it is easier to optimize the architecture to make them more efficient. Such accelerators are not limited to high performance sector alone. In low power computing, they allow complex tasks such as computer vision or cryptography to be performed under a very tight power budget. Without a dedicated accelerator, these tasks would not be feasible.<br />
<br />
===Who We Are===<br />
====Francesco Conti====<br />
* [mailto:fconti@iis.ee.ethz.ch fconti@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Luca Bertaccini====<br />
* [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Matteo Perotti====<br />
* [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
* ETZ J85<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
[[File:Selene.jpg|thumb|right|The Logarithmic Number Unit chip [http://asic.ethz.ch/2014/Selene.html Selene].]]<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Hardware_Acceleration&diff=9948Hardware Acceleration2023-11-24T15:04:33Z<p>Fischeti: </p>
<hr />
<div>[[File:NVIDIA Tesla V100.jpg|thumb|right|A NVIDIA Tesla V100 GP-GPU. This cutting-edge accelerator provides huge computational power on a [https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/ massive 800 mm² die].]]<br />
[[File:Google Cloud TPU.png|thumb|right|Google's Cloud TPU (Tensor Processing Unit). This machine learning accelerator can do one thing extremely well: multiply-accumulate operations.]]<br />
<br />
Accelerators are the backbone of big data and scientific computing. While general-purpose processor architectures such as Intel's x86 provide good performance across a wide variety of applications, it is only since the advent of general-purpose GPUs that many computationally demanding tasks have become feasible. Since these GPUs support a much narrower set of operations, it is easier to optimize the architecture to make them more efficient. Such accelerators are not limited to the high-performance sector alone. In low power computing, they allow complex tasks such as computer vision or cryptography to be performed under a very tight power budget. Without a dedicated accelerator, these tasks would not be feasible.<br />
<br />
==General-Purpose Computing==<br />
<br />
While monolithic accelerators dedicated to specific tasks are usually unbeatable in terms of throughput and energy efficiency, they also have drawbacks: they are commonly integrated onto Systems-on-Chip where they need to interact with, and are commonly be programmed by, regular general-purpose processors. This ''separation of compute acceleration and control'' limits the system's flexibility and real-world performance as communication and data exchange between the processor and accelerator become major bottlenecks.<br />
<br />
A common alternative is to accelerate problems ''inside general-purpose cores directly''. This can be done either by writing optimized software for specific problems, or by integrating dedicated acceleration hardware ''directly into the processor's ISA and pipeline''. The most prominent example of the latter is the often overlooked, yet ubiquitous ''Floating Point Unit'' (FPU). However, this same idea can be applied to a large variety of problems, and different architecture extensions may even work together in effective ways.<br />
<br />
If you are looking for a project and these ideas sound interesting to you, do not hesitate to contact us!<br />
<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
<br />
<br />
==Computational Units==<br />
The last decade has seen explosive growth in the quest for energy-efficient architectures and systems. An era of exponentially improving computing efficiency - driven mostly by CMOS technology scaling - is coming to an end as Moore’s law falters. The obstacle of the so-called thermal- or power-wall is fueling a push towards computing paradigms, which hold energy efficiency as the ultimate figure of merit for any hardware design.<br />
<br />
The broad term "computational units" covers a wide range of hardware accelerators for a multitude of different systems, such as floating-point units (FPUs) for processors, or dedicated accelerators for cryptography, signal processing, etc. Such computational units are housed within full systems which usually command stringent requirements in terms of performance, size, and efficiency.<br />
<br />
Key topics of interest are energy-efficient accelerators at various extremes of the design space, covering high-performance, ultra low-power, or minimum area implementations, as well as the exploration of novel paradigms in computing, arithmetics, and processor architectures.<br />
<br />
<br />
====[[:User:Lbertaccini | Luca Bertaccini]]====<br />
* '''e-mail''': [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* '''office''': ETZ J78<br />
<br />
====[[:User:Mperotti | Matteo Perotti]]i====<br />
* '''e-mail''': [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
* '''office''': OAT U21<br />
<br />
==Hardware Acceleration of DNNs and QNNs==<br />
Deep Learning (DL) and Artificial Intelligence (AI) are quickly becoming dominant paradigms for all kinds of analytics, complementing or replacing traditional data science methods. Successful at-scale deployment of these algorithms requires deploying them directly at the data source, i.e. in the IoT end-nodes collecting data. However, due to the extreme constraints of these devices (in terms of power, memory footprint, area cost), performing full DL inference in-situ in low-power end-nodes requires a breakthrough in computational performance and efficiency.<br />
It is widely known that the numerical representation typically used when developing DL algorithms (single-precision floating-point) encodes a higher precision than what is actually required to achieve high quality-of-results in inference (Courbariaux et al. 2016); this fact can be exploited in the design of energy-efficient hardware for DL.<br />
For example, by using ternary weights, which means all network weights are quantized to {-1,0,1}, we can design the fundamental compute units in hardware without using an HW-expensive multiplication unit. Additionally, it allows us to store the weights much more compact on-chip.<br />
<br />
{|<br />
| style="padding: 10px" | [[File:gianna.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Paulin| Gianna Paulin]]===<br />
* '''e-mail''': [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 60 80<br />
* '''office''': ETZ J76.2<br />
|}<br />
{|<br />
| style="padding: 64px" |<br />
|<br />
===Georg Rutishauser===<br />
* '''e-mail''': [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 97<br />
* '''office''': ETZ J68.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Moritz_scherer.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Scheremo| Moritz Scherer]]===<br />
* '''e-mail''': [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 77 86<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Wiesep.jpg|frameless|left|96px]]<br />
|<br />
===[[User:Wiesep | Philip Wiese]]===<br />
* '''e-mail''': [mailto:wiesep@iis.ee.ethz.ch wiesep@iis.ee.ethz.ch]<br />
* '''phone''': +41 79 244 92 40<br />
* '''office''': OAT U25<br />
|}<br />
<br />
==Projects Overview==<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
[[File:Selene.jpg|thumb|right|The Logarithmic Number Unit chip [http://asic.ethz.ch/2014/Selene.html Selene].]]<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Hardware_Acceleration&diff=9947Hardware Acceleration2023-11-24T15:03:37Z<p>Fischeti: /* Matteo Perotti */</p>
<hr />
<div>[[File:NVIDIA Tesla V100.jpg|thumb|right|A NVIDIA Tesla V100 GP-GPU. This cutting-edge accelerator provides huge computational power on a [https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/ massive 800 mm² die].]]<br />
[[File:Google Cloud TPU.png|thumb|right|Google's Cloud TPU (Tensor Processing Unit). This machine learning accelerator can do one thing extremely well: multiply-accumulate operations.]]<br />
<br />
Accelerators are the backbone of big data and scientific computing. While general-purpose processor architectures such as Intel's x86 provide good performance across a wide variety of applications, it is only since the advent of general-purpose GPUs that many computationally demanding tasks have become feasible. Since these GPUs support a much narrower set of operations, it is easier to optimize the architecture to make them more efficient. Such accelerators are not limited to the high-performance sector alone. In low power computing, they allow complex tasks such as computer vision or cryptography to be performed under a very tight power budget. Without a dedicated accelerator, these tasks would not be feasible.<br />
<br />
==General-Purpose Computing==<br />
<br />
While monolithic accelerators dedicated to specific tasks are usually unbeatable in terms of throughput and energy efficiency, they also have drawbacks: they are commonly integrated onto Systems-on-Chip where they need to interact with, and are commonly be programmed by, regular general-purpose processors. This ''separation of compute acceleration and control'' limits the system's flexibility and real-world performance as communication and data exchange between the processor and accelerator become major bottlenecks.<br />
<br />
A common alternative is to accelerate problems ''inside general-purpose cores directly''. This can be done either by writing optimized software for specific problems, or by integrating dedicated acceleration hardware ''directly into the processor's ISA and pipeline''. The most prominent example of the latter is the often overlooked, yet ubiquitous ''Floating Point Unit'' (FPU). However, this same idea can be applied to a large variety of problems, and different architecture extensions may even work together in effective ways.<br />
<br />
If you are looking for a project and these ideas sound interesting to you, do not hesitate to contact us!<br />
<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
<br />
<br />
==Computational Units==<br />
The last decade has seen explosive growth in the quest for energy-efficient architectures and systems. An era of exponentially improving computing efficiency - driven mostly by CMOS technology scaling - is coming to an end as Moore’s law falters. The obstacle of the so-called thermal- or power-wall is fueling a push towards computing paradigms, which hold energy efficiency as the ultimate figure of merit for any hardware design.<br />
<br />
The broad term "computational units" covers a wide range of hardware accelerators for a multitude of different systems, such as floating-point units (FPUs) for processors, or dedicated accelerators for cryptography, signal processing, etc. Such computational units are housed within full systems which usually command stringent requirements in terms of performance, size, and efficiency.<br />
<br />
Key topics of interest are energy-efficient accelerators at various extremes of the design space, covering high-performance, ultra low-power, or minimum area implementations, as well as the exploration of novel paradigms in computing, arithmetics, and processor architectures.<br />
<br />
<br />
====[[:User:Lbertaccini | Luca Bertaccini]]====<br />
* [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====[[:User:Mperotti | Matteo Perotti]]i====<br />
* [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
==Hardware Acceleration of DNNs and QNNs==<br />
Deep Learning (DL) and Artificial Intelligence (AI) are quickly becoming dominant paradigms for all kinds of analytics, complementing or replacing traditional data science methods. Successful at-scale deployment of these algorithms requires deploying them directly at the data source, i.e. in the IoT end-nodes collecting data. However, due to the extreme constraints of these devices (in terms of power, memory footprint, area cost), performing full DL inference in-situ in low-power end-nodes requires a breakthrough in computational performance and efficiency.<br />
It is widely known that the numerical representation typically used when developing DL algorithms (single-precision floating-point) encodes a higher precision than what is actually required to achieve high quality-of-results in inference (Courbariaux et al. 2016); this fact can be exploited in the design of energy-efficient hardware for DL.<br />
For example, by using ternary weights, which means all network weights are quantized to {-1,0,1}, we can design the fundamental compute units in hardware without using an HW-expensive multiplication unit. Additionally, it allows us to store the weights much more compact on-chip.<br />
<br />
{|<br />
| style="padding: 10px" | [[File:gianna.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Paulin| Gianna Paulin]]===<br />
* '''e-mail''': [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 60 80<br />
* '''office''': ETZ J76.2<br />
|}<br />
{|<br />
| style="padding: 64px" |<br />
|<br />
===Georg Rutishauser===<br />
* '''e-mail''': [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 97<br />
* '''office''': ETZ J68.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Moritz_scherer.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Scheremo| Moritz Scherer]]===<br />
* '''e-mail''': [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 77 86<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Wiesep.jpg|frameless|left|96px]]<br />
|<br />
===[[User:Wiesep | Philip Wiese]]===<br />
* '''e-mail''': [mailto:wiesep@iis.ee.ethz.ch wiesep@iis.ee.ethz.ch]<br />
* '''phone''': +41 79 244 92 40<br />
* '''office''': OAT U25<br />
|}<br />
<br />
==Projects Overview==<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
[[File:Selene.jpg|thumb|right|The Logarithmic Number Unit chip [http://asic.ethz.ch/2014/Selene.html Selene].]]<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Hardware_Acceleration&diff=9946Hardware Acceleration2023-11-24T15:03:06Z<p>Fischeti: /* Computational Units */</p>
<hr />
<div>[[File:NVIDIA Tesla V100.jpg|thumb|right|A NVIDIA Tesla V100 GP-GPU. This cutting-edge accelerator provides huge computational power on a [https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/ massive 800 mm² die].]]<br />
[[File:Google Cloud TPU.png|thumb|right|Google's Cloud TPU (Tensor Processing Unit). This machine learning accelerator can do one thing extremely well: multiply-accumulate operations.]]<br />
<br />
Accelerators are the backbone of big data and scientific computing. While general-purpose processor architectures such as Intel's x86 provide good performance across a wide variety of applications, it is only since the advent of general-purpose GPUs that many computationally demanding tasks have become feasible. Since these GPUs support a much narrower set of operations, it is easier to optimize the architecture to make them more efficient. Such accelerators are not limited to the high-performance sector alone. In low power computing, they allow complex tasks such as computer vision or cryptography to be performed under a very tight power budget. Without a dedicated accelerator, these tasks would not be feasible.<br />
<br />
==General-Purpose Computing==<br />
<br />
While monolithic accelerators dedicated to specific tasks are usually unbeatable in terms of throughput and energy efficiency, they also have drawbacks: they are commonly integrated onto Systems-on-Chip where they need to interact with, and are commonly be programmed by, regular general-purpose processors. This ''separation of compute acceleration and control'' limits the system's flexibility and real-world performance as communication and data exchange between the processor and accelerator become major bottlenecks.<br />
<br />
A common alternative is to accelerate problems ''inside general-purpose cores directly''. This can be done either by writing optimized software for specific problems, or by integrating dedicated acceleration hardware ''directly into the processor's ISA and pipeline''. The most prominent example of the latter is the often overlooked, yet ubiquitous ''Floating Point Unit'' (FPU). However, this same idea can be applied to a large variety of problems, and different architecture extensions may even work together in effective ways.<br />
<br />
If you are looking for a project and these ideas sound interesting to you, do not hesitate to contact us!<br />
<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
<br />
<br />
==Computational Units==<br />
The last decade has seen explosive growth in the quest for energy-efficient architectures and systems. An era of exponentially improving computing efficiency - driven mostly by CMOS technology scaling - is coming to an end as Moore’s law falters. The obstacle of the so-called thermal- or power-wall is fueling a push towards computing paradigms, which hold energy efficiency as the ultimate figure of merit for any hardware design.<br />
<br />
The broad term "computational units" covers a wide range of hardware accelerators for a multitude of different systems, such as floating-point units (FPUs) for processors, or dedicated accelerators for cryptography, signal processing, etc. Such computational units are housed within full systems which usually command stringent requirements in terms of performance, size, and efficiency.<br />
<br />
Key topics of interest are energy-efficient accelerators at various extremes of the design space, covering high-performance, ultra low-power, or minimum area implementations, as well as the exploration of novel paradigms in computing, arithmetics, and processor architectures.<br />
<br />
<br />
====[[:User:Lbertaccini | Luca Bertaccini]]====<br />
* [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Matteo Perotti====<br />
* [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
==Hardware Acceleration of DNNs and QNNs==<br />
Deep Learning (DL) and Artificial Intelligence (AI) are quickly becoming dominant paradigms for all kinds of analytics, complementing or replacing traditional data science methods. Successful at-scale deployment of these algorithms requires deploying them directly at the data source, i.e. in the IoT end-nodes collecting data. However, due to the extreme constraints of these devices (in terms of power, memory footprint, area cost), performing full DL inference in-situ in low-power end-nodes requires a breakthrough in computational performance and efficiency.<br />
It is widely known that the numerical representation typically used when developing DL algorithms (single-precision floating-point) encodes a higher precision than what is actually required to achieve high quality-of-results in inference (Courbariaux et al. 2016); this fact can be exploited in the design of energy-efficient hardware for DL.<br />
For example, by using ternary weights, which means all network weights are quantized to {-1,0,1}, we can design the fundamental compute units in hardware without using an HW-expensive multiplication unit. Additionally, it allows us to store the weights much more compact on-chip.<br />
<br />
{|<br />
| style="padding: 10px" | [[File:gianna.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Paulin| Gianna Paulin]]===<br />
* '''e-mail''': [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 60 80<br />
* '''office''': ETZ J76.2<br />
|}<br />
{|<br />
| style="padding: 64px" |<br />
|<br />
===Georg Rutishauser===<br />
* '''e-mail''': [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 97<br />
* '''office''': ETZ J68.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Moritz_scherer.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Scheremo| Moritz Scherer]]===<br />
* '''e-mail''': [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 77 86<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Wiesep.jpg|frameless|left|96px]]<br />
|<br />
===[[User:Wiesep | Philip Wiese]]===<br />
* '''e-mail''': [mailto:wiesep@iis.ee.ethz.ch wiesep@iis.ee.ethz.ch]<br />
* '''phone''': +41 79 244 92 40<br />
* '''office''': OAT U25<br />
|}<br />
<br />
==Projects Overview==<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
[[File:Selene.jpg|thumb|right|The Logarithmic Number Unit chip [http://asic.ethz.ch/2014/Selene.html Selene].]]<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Hardware_Acceleration&diff=9945Hardware Acceleration2023-11-24T15:02:55Z<p>Fischeti: /* Computational Units */</p>
<hr />
<div>[[File:NVIDIA Tesla V100.jpg|thumb|right|A NVIDIA Tesla V100 GP-GPU. This cutting-edge accelerator provides huge computational power on a [https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/ massive 800 mm² die].]]<br />
[[File:Google Cloud TPU.png|thumb|right|Google's Cloud TPU (Tensor Processing Unit). This machine learning accelerator can do one thing extremely well: multiply-accumulate operations.]]<br />
<br />
Accelerators are the backbone of big data and scientific computing. While general-purpose processor architectures such as Intel's x86 provide good performance across a wide variety of applications, it is only since the advent of general-purpose GPUs that many computationally demanding tasks have become feasible. Since these GPUs support a much narrower set of operations, it is easier to optimize the architecture to make them more efficient. Such accelerators are not limited to the high-performance sector alone. In low power computing, they allow complex tasks such as computer vision or cryptography to be performed under a very tight power budget. Without a dedicated accelerator, these tasks would not be feasible.<br />
<br />
==General-Purpose Computing==<br />
<br />
While monolithic accelerators dedicated to specific tasks are usually unbeatable in terms of throughput and energy efficiency, they also have drawbacks: they are commonly integrated onto Systems-on-Chip where they need to interact with, and are commonly be programmed by, regular general-purpose processors. This ''separation of compute acceleration and control'' limits the system's flexibility and real-world performance as communication and data exchange between the processor and accelerator become major bottlenecks.<br />
<br />
A common alternative is to accelerate problems ''inside general-purpose cores directly''. This can be done either by writing optimized software for specific problems, or by integrating dedicated acceleration hardware ''directly into the processor's ISA and pipeline''. The most prominent example of the latter is the often overlooked, yet ubiquitous ''Floating Point Unit'' (FPU). However, this same idea can be applied to a large variety of problems, and different architecture extensions may even work together in effective ways.<br />
<br />
If you are looking for a project and these ideas sound interesting to you, do not hesitate to contact us!<br />
<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
<br />
<br />
==Computational Units==<br />
The last decade has seen explosive growth in the quest for energy-efficient architectures and systems. An era of exponentially improving computing efficiency - driven mostly by CMOS technology scaling - is coming to an end as Moore’s law falters. The obstacle of the so-called thermal- or power-wall is fueling a push towards computing paradigms, which hold energy efficiency as the ultimate figure of merit for any hardware design.<br />
<br />
The broad term "computational units" covers a wide range of hardware accelerators for a multitude of different systems, such as floating-point units (FPUs) for processors, or dedicated accelerators for cryptography, signal processing, etc. Such computational units are housed within full systems which usually command stringent requirements in terms of performance, size, and efficiency.<br />
<br />
Key topics of interest are energy-efficient accelerators at various extremes of the design space, covering high-performance, ultra low-power, or minimum area implementations, as well as the exploration of novel paradigms in computing, arithmetics, and processor architectures.<br />
<br />
<br />
====[[:User:Lbertaccini | Luca Bertaccini]]====<br />
* [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Matteo Perotti====<br />
* [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Stefan Mach====<br />
* [mailto:smach@iis.ee.ethz.ch smach@iis.ee.ethz.ch]<br />
* ETZ J89<br />
<br />
==Hardware Acceleration of DNNs and QNNs==<br />
Deep Learning (DL) and Artificial Intelligence (AI) are quickly becoming dominant paradigms for all kinds of analytics, complementing or replacing traditional data science methods. Successful at-scale deployment of these algorithms requires deploying them directly at the data source, i.e. in the IoT end-nodes collecting data. However, due to the extreme constraints of these devices (in terms of power, memory footprint, area cost), performing full DL inference in-situ in low-power end-nodes requires a breakthrough in computational performance and efficiency.<br />
It is widely known that the numerical representation typically used when developing DL algorithms (single-precision floating-point) encodes a higher precision than what is actually required to achieve high quality-of-results in inference (Courbariaux et al. 2016); this fact can be exploited in the design of energy-efficient hardware for DL.<br />
For example, by using ternary weights, which means all network weights are quantized to {-1,0,1}, we can design the fundamental compute units in hardware without using an HW-expensive multiplication unit. Additionally, it allows us to store the weights much more compact on-chip.<br />
<br />
{|<br />
| style="padding: 10px" | [[File:gianna.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Paulin| Gianna Paulin]]===<br />
* '''e-mail''': [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 60 80<br />
* '''office''': ETZ J76.2<br />
|}<br />
{|<br />
| style="padding: 64px" |<br />
|<br />
===Georg Rutishauser===<br />
* '''e-mail''': [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 97<br />
* '''office''': ETZ J68.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Moritz_scherer.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Scheremo| Moritz Scherer]]===<br />
* '''e-mail''': [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 77 86<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Wiesep.jpg|frameless|left|96px]]<br />
|<br />
===[[User:Wiesep | Philip Wiese]]===<br />
* '''e-mail''': [mailto:wiesep@iis.ee.ethz.ch wiesep@iis.ee.ethz.ch]<br />
* '''phone''': +41 79 244 92 40<br />
* '''office''': OAT U25<br />
|}<br />
<br />
==Projects Overview==<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
[[File:Selene.jpg|thumb|right|The Logarithmic Number Unit chip [http://asic.ethz.ch/2014/Selene.html Selene].]]<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Hardware_Acceleration&diff=9944Hardware Acceleration2023-11-24T15:02:10Z<p>Fischeti: </p>
<hr />
<div>[[File:NVIDIA Tesla V100.jpg|thumb|right|A NVIDIA Tesla V100 GP-GPU. This cutting-edge accelerator provides huge computational power on a [https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/ massive 800 mm² die].]]<br />
[[File:Google Cloud TPU.png|thumb|right|Google's Cloud TPU (Tensor Processing Unit). This machine learning accelerator can do one thing extremely well: multiply-accumulate operations.]]<br />
<br />
Accelerators are the backbone of big data and scientific computing. While general-purpose processor architectures such as Intel's x86 provide good performance across a wide variety of applications, it is only since the advent of general-purpose GPUs that many computationally demanding tasks have become feasible. Since these GPUs support a much narrower set of operations, it is easier to optimize the architecture to make them more efficient. Such accelerators are not limited to the high-performance sector alone. In low power computing, they allow complex tasks such as computer vision or cryptography to be performed under a very tight power budget. Without a dedicated accelerator, these tasks would not be feasible.<br />
<br />
==General-Purpose Computing==<br />
<br />
While monolithic accelerators dedicated to specific tasks are usually unbeatable in terms of throughput and energy efficiency, they also have drawbacks: they are commonly integrated onto Systems-on-Chip where they need to interact with, and are commonly be programmed by, regular general-purpose processors. This ''separation of compute acceleration and control'' limits the system's flexibility and real-world performance as communication and data exchange between the processor and accelerator become major bottlenecks.<br />
<br />
A common alternative is to accelerate problems ''inside general-purpose cores directly''. This can be done either by writing optimized software for specific problems, or by integrating dedicated acceleration hardware ''directly into the processor's ISA and pipeline''. The most prominent example of the latter is the often overlooked, yet ubiquitous ''Floating Point Unit'' (FPU). However, this same idea can be applied to a large variety of problems, and different architecture extensions may even work together in effective ways.<br />
<br />
If you are looking for a project and these ideas sound interesting to you, do not hesitate to contact us!<br />
<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
<br />
<br />
==Computational Units==<br />
The last decade has seen explosive growth in the quest for energy-efficient architectures and systems. An era of exponentially improving computing efficiency - driven mostly by CMOS technology scaling - is coming to an end as Moore’s law falters. The obstacle of the so-called thermal- or power-wall is fueling a push towards computing paradigms, which hold energy efficiency as the ultimate figure of merit for any hardware design.<br />
<br />
The broad term "computational units" covers a wide range of hardware accelerators for a multitude of different systems, such as floating-point units (FPUs) for processors, or dedicated accelerators for cryptography, signal processing, etc. Such computational units are housed within full systems which usually command stringent requirements in terms of performance, size, and efficiency.<br />
<br />
Key topics of interest are energy-efficient accelerators at various extremes of the design space, covering high-performance, ultra low-power, or minimum area implementations, as well as the exploration of novel paradigms in computing, arithmetics, and processor architectures.<br />
<br />
<br />
====[[:User:LBertaccini | Luca Bertaccini]]====<br />
* [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Matteo Perotti====<br />
* [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Stefan Mach====<br />
* [mailto:smach@iis.ee.ethz.ch smach@iis.ee.ethz.ch]<br />
* ETZ J89<br />
<br />
==Hardware Acceleration of DNNs and QNNs==<br />
Deep Learning (DL) and Artificial Intelligence (AI) are quickly becoming dominant paradigms for all kinds of analytics, complementing or replacing traditional data science methods. Successful at-scale deployment of these algorithms requires deploying them directly at the data source, i.e. in the IoT end-nodes collecting data. However, due to the extreme constraints of these devices (in terms of power, memory footprint, area cost), performing full DL inference in-situ in low-power end-nodes requires a breakthrough in computational performance and efficiency.<br />
It is widely known that the numerical representation typically used when developing DL algorithms (single-precision floating-point) encodes a higher precision than what is actually required to achieve high quality-of-results in inference (Courbariaux et al. 2016); this fact can be exploited in the design of energy-efficient hardware for DL.<br />
For example, by using ternary weights, which means all network weights are quantized to {-1,0,1}, we can design the fundamental compute units in hardware without using an HW-expensive multiplication unit. Additionally, it allows us to store the weights much more compact on-chip.<br />
<br />
{|<br />
| style="padding: 10px" | [[File:gianna.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Paulin| Gianna Paulin]]===<br />
* '''e-mail''': [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 60 80<br />
* '''office''': ETZ J76.2<br />
|}<br />
{|<br />
| style="padding: 64px" |<br />
|<br />
===Georg Rutishauser===<br />
* '''e-mail''': [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 97<br />
* '''office''': ETZ J68.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Moritz_scherer.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Scheremo| Moritz Scherer]]===<br />
* '''e-mail''': [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 77 86<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Wiesep.jpg|frameless|left|96px]]<br />
|<br />
===[[User:Wiesep | Philip Wiese]]===<br />
* '''e-mail''': [mailto:wiesep@iis.ee.ethz.ch wiesep@iis.ee.ethz.ch]<br />
* '''phone''': +41 79 244 92 40<br />
* '''office''': OAT U25<br />
|}<br />
<br />
==Projects Overview==<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
[[File:Selene.jpg|thumb|right|The Logarithmic Number Unit chip [http://asic.ethz.ch/2014/Selene.html Selene].]]<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Hardware_Acceleration&diff=9943Hardware Acceleration2023-11-24T15:01:16Z<p>Fischeti: </p>
<hr />
<div>[[File:NVIDIA Tesla V100.jpg|thumb|right|A NVIDIA Tesla V100 GP-GPU. This cutting-edge accelerator provides huge computational power on a [https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/ massive 800 mm² die].]]<br />
[[File:Google Cloud TPU.png|thumb|right|Google's Cloud TPU (Tensor Processing Unit). This machine learning accelerator can do one thing extremely well: multiply-accumulate operations.]]<br />
<br />
Accelerators are the backbone of big data and scientific computing. While general-purpose processor architectures such as Intel's x86 provide good performance across a wide variety of applications, it is only since the advent of general-purpose GPUs that many computationally demanding tasks have become feasible. Since these GPUs support a much narrower set of operations, it is easier to optimize the architecture to make them more efficient. Such accelerators are not limited to the high-performance sector alone. In low power computing, they allow complex tasks such as computer vision or cryptography to be performed under a very tight power budget. Without a dedicated accelerator, these tasks would not be feasible.<br />
<br />
==General-Purpose Computing==<br />
<br />
While monolithic accelerators dedicated to specific tasks are usually unbeatable in terms of throughput and energy efficiency, they also have drawbacks: they are commonly integrated onto Systems-on-Chip where they need to interact with, and are commonly be programmed by, regular general-purpose processors. This ''separation of compute acceleration and control'' limits the system's flexibility and real-world performance as communication and data exchange between the processor and accelerator become major bottlenecks.<br />
<br />
A common alternative is to accelerate problems ''inside general-purpose cores directly''. This can be done either by writing optimized software for specific problems, or by integrating dedicated acceleration hardware ''directly into the processor's ISA and pipeline''. The most prominent example of the latter is the often overlooked, yet ubiquitous ''Floating Point Unit'' (FPU). However, this same idea can be applied to a large variety of problems, and different architecture extensions may even work together in effective ways.<br />
<br />
If you are looking for a project and these ideas sound interesting to you, do not hesitate to contact us!<br />
<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
<br />
<br />
==Computational Units==<br />
The last decade has seen explosive growth in the quest for energy-efficient architectures and systems. An era of exponentially improving computing efficiency - driven mostly by CMOS technology scaling - is coming to an end as Moore’s law falters. The obstacle of the so-called thermal- or power-wall is fueling a push towards computing paradigms, which hold energy efficiency as the ultimate figure of merit for any hardware design.<br />
<br />
The broad term "computational units" covers a wide range of hardware accelerators for a multitude of different systems, such as floating-point units (FPUs) for processors, or dedicated accelerators for cryptography, signal processing, etc. Such computational units are housed within full systems which usually command stringent requirements in terms of performance, size, and efficiency.<br />
<br />
Key topics of interest are energy-efficient accelerators at various extremes of the design space, covering high-performance, ultra low-power, or minimum area implementations, as well as the exploration of novel paradigms in computing, arithmetics, and processor architectures.<br />
<br />
<br />
====Luca Bertaccini====<br />
* [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Matteo Perotti====<br />
* [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Stefan Mach====<br />
* [mailto:smach@iis.ee.ethz.ch smach@iis.ee.ethz.ch]<br />
* ETZ J89<br />
<br />
==Hardware Acceleration of DNNs and QNNs==<br />
Deep Learning (DL) and Artificial Intelligence (AI) are quickly becoming dominant paradigms for all kinds of analytics, complementing or replacing traditional data science methods. Successful at-scale deployment of these algorithms requires deploying them directly at the data source, i.e. in the IoT end-nodes collecting data. However, due to the extreme constraints of these devices (in terms of power, memory footprint, area cost), performing full DL inference in-situ in low-power end-nodes requires a breakthrough in computational performance and efficiency.<br />
It is widely known that the numerical representation typically used when developing DL algorithms (single-precision floating-point) encodes a higher precision than what is actually required to achieve high quality-of-results in inference (Courbariaux et al. 2016); this fact can be exploited in the design of energy-efficient hardware for DL.<br />
For example, by using ternary weights, which means all network weights are quantized to {-1,0,1}, we can design the fundamental compute units in hardware without using an HW-expensive multiplication unit. Additionally, it allows us to store the weights much more compact on-chip.<br />
<br />
{|<br />
| style="padding: 10px" | [[File:gianna.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Paulin| Gianna Paulin]]===<br />
* '''e-mail''': [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 60 80<br />
* '''office''': ETZ J76.2<br />
|}<br />
{|<br />
| style="padding: 64px" |<br />
|<br />
===Georg Rutishauser===<br />
* '''e-mail''': [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 97<br />
* '''office''': ETZ J68.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Moritz_scherer.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Scheremo| Moritz Scherer]]===<br />
* '''e-mail''': [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 77 86<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Wiesep.jpg|frameless|left|96px]]<br />
|<br />
===[[User:Wiesep | Philip Wiese]]===<br />
* '''e-mail''': [mailto:wiesep@iis.ee.ethz.ch wiesep@iis.ee.ethz.ch]<br />
* '''phone''': +41 79 244 92 40<br />
* '''office''': OAT U25<br />
|}<br />
<br />
==Projects Overview==<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
[[File:Selene.jpg|thumb|right|The Logarithmic Number Unit chip [http://asic.ethz.ch/2014/Selene.html Selene].]]<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=9875Deep Learning Projects2023-11-03T15:02:13Z<p>Fischeti: </p>
<hr />
<div>==What is Deep Learning?==<br />
Nowadays, machine learning systems are the go-to choice when the cost of analytically deriving closed-form expressions to solve a given problem is prohibitive (e.g., it is very time-consuming, or the knowledge about the problem is insufficient). Machine learning systems can be particularly effective when the amount of data is large, since the statistics are expected to get more and more stable as the amount of data increases.<br />
Amongst machine learning systems, deep neural networks (DNNs) have established a reputation for their effectiveness and simplicity. To understand this success as compared to that of other machine learning systems, it is important to consider not only the accuracy performance of DNNs, but also their computational properties. The training algorithm (an iterative application of backpropagation and stochastic gradient descent) is linear in the data set size, making it more appealing in big data contexts than, for instance, support vector machines (SVMs). DNNs do not use branching instructions, making them predictable programs and allowing to design efficient access patterns for the memory hierarchies of the computing devices (exploiting spatial and temporal locality). DNNs are parallelizable, both at the neuron level and at the layer level. These predictability and parallelizability properties make DNNs an ideal fit for modern SIMD architectures and distributed computing systems.<br />
<br />
<br />
The main drawback of these systems is their size: millions or even billions of parameters are a common feature of many top-performing DNNs, and a proportional amount of arithmetic operations must be performed to process each data sample. Hence, to reduce the pressure of DNNs on the underlying computing infrastructure, research in computational deep learning has focussed on two families of optimizations: topological and hardware-oriented.<br />
'''Topological optimizations''' are concerned with network topologies (AKA network architectures) which are more efficient in terms of accuracy-per-parameter or accuracy-per-MAC (multiply-accumulate operation). As a specific form of topological optimization, '''pruning''' strategies aim at maximizing the number of zero-valued operands (parameters and/or activations) in order to 1) take advantage of sparsity (for storing the model) and to 2) minimize the number of effective arithmetic operations (i.e., the operations not involving zero-valued operands, which must be actually executed). '''Hardware-oriented optimizations''' are instead concerned with replacing time-consuming and energy-hungry operations, such as evaluations of transcendent functions or floating-point MAC operations, with more efficient counterparts, such as piecewise linear activation functions (e.g., the ReLU) and integer MAC operations (as in quantized neural networks, QNNs).<br />
<br />
<br />
==Hardware-oriented neural architecture search (NAS)==<br />
The problems of topology selection and pruning can be considered instances of the classical statistics problems of model selection and feature selection, respectively. In the scope of deep learning, model selection is also called neural architecture search (NAS).<br />
When designing a DNN topology, you have a large number of degrees of freedom at your disposal: number of layers, number of neurons for each layer, connectivity of each neuron, and so on; moreover, the number of choices for each degree of freedom is huge. These properties imply that the design space for a DNN can grow exponentially, making exhaustive searches prohibitive. Therefore, to increase the efficiency of the exploration, stochastic optimization tools are the preferred choice: evolutionary algorithms, reinforcement learning, gradient-based techniques or even random graph generation.<br />
An interesting feature of model selection is that specific constraints can be enforced on the search space so that desired properties are always respected. For instance, given a storage budget describing a hard limitation of the chosen computing platform, the network generation algorithm can be limited to propose topologies that do not exceed a given number of parameters. This capability of incorporating HW features as constraints on the search space make NAS algorithms very interesting in the context of generating HW-friendly DNNs.<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Thorir.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Thoriri| Thorir Mar Ingolfsson]]===<br />
* '''e-mail''': [mailto:thoriri@iis.ee.ethz.ch thoriri@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 633 88 43<br />
* '''office''': ETZ J76.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Cioflanc.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Cioflanc| Cristian Cioflan]]===<br />
* '''e-mail''': [mailto:cioflanc@iis.ee.ethz.ch cioflanc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 67 89 <br />
* '''office''': ETZ J89<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:victor_jung.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Jungvi| Victor Jung]]===<br />
* '''e-mail''': [mailto:jungvi@iis.ee.ethz.ch jungvi@iis.ee.ethz.ch]<br />
* '''phone''': TBD<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
==Algorithms & Frameworks for Quantization and Deployment for Deep Neural Networks (DNNs)==<br />
The typical training algorithm for DNNs is an iterative application of the backpropagation algorithm (BP) and stochastic gradient descent (SGD).<br />
When the quantization is not “aggressive” (i.e., when the parameters and feature maps can be represented as integers with a precision of 8-bits or more), many solutions are available either in specialized literature or in commercial software that can convert models pre-trained with gradient descent to quantized counterparts (post-training quantization).<br />
But when the precision is extremely reduced (i.e., 1-bit or 2-bits operands), these solutions can no longer be applied, and quantization-aware training algorithms are needed. The naive application of gradient descent (which in theory is not even correct) to train these QNNs yields major accuracy drops. Hence, it is likely that suitable training algorithms for QNNs require to replace the standard BP+SGD scheme, which is suitable for differentiable optimization, with search strategies that are more apt for discrete optimization.<br />
<br />
{|<br />
| style="padding: 10px" | [[File:gianna.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Paulin| Gianna Paulin]]===<br />
* '''e-mail''': [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 60 80<br />
* '''office''': ETZ J76.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:victor_jung.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Jungvi| Victor Jung]]===<br />
* '''e-mail''': [mailto:jungvi@iis.ee.ethz.ch jungvi@iis.ee.ethz.ch]<br />
* '''phone''': TBD<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Cioflanc.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Cioflanc| Cristian Cioflan]]===<br />
* '''e-mail''': [mailto:cioflanc@iis.ee.ethz.ch cioflanc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 67 89 <br />
* '''office''': ETZ J89<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:jannis_schoenleber.jpg|frameless|left|96px]]<br />
|<br />
<br />
===[[:User:Janniss| Jannis Schönleber]]===<br />
* '''e-mail''': [mailto:janniss@iis.ee.ethz.ch janniss@iis.ee.ethz.ch]<br />
* '''phone''': TBD<br />
* '''office''': ETZ J76.2<br />
|}<br />
<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Georg.jpg|frameless|left|96px]]<br />
|<br />
===[[User:Georg | Georg Rutishauser]]===<br />
* '''e-mail''': [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 97<br />
* '''office''': ETZ J68.2<br />
|}<br />
<br />
==Hardware Acceleration of DNNs and QNNs==<br />
Deep Learning (DL) and Artificial Intelligence (AI) are quickly becoming dominant paradigms for all kinds of analytics, complementing or replacing traditional data science methods. Successful at-scale deployment of these algorithms requires deploying them directly at the data source, i.e. in the IoT end-nodes collecting data. However, due to the extreme constraints of these devices (in terms of power, memory footprint, area cost), performing full DL inference in-situ in low-power end-nodes requires a breakthrough in computational performance and efficiency.<br />
It is widely known that the numerical representation typically used when developing DL algorithms (single-precision floating-point) encodes a higher precision than what is actually required to achieve high quality-of-results in inference (Courbariaux et al. 2016); this fact can be exploited in the design of energy-efficient hardware for DL.<br />
For example, by using ternary weights, which means all network weights are quantized to {-1,0,1}, we can design the fundamental compute units in hardware without using an HW-expensive multiplication unit. Additionally, it allows us to store the weights much more compact on-chip.<br />
<br />
<br />
{|<br />
| style="padding: 10px" | [[File:angelo_garofalo.png|frameless|left|96px]]<br />
|<br />
===[[:User:Agarofalo| Angelo Garofalo]]===<br />
* '''e-mail''': [mailto:agarofalo@iis.ee.ethz.ch agarofalo@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 82 19<br />
* '''office''': ETZ J78<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:gianna.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Paulin| Gianna Paulin]]===<br />
* '''e-mail''': [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 60 80<br />
* '''office''': ETZ J76.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Georg.jpg|frameless|left|96px]]<br />
|<br />
===[[User:Georg | Georg Rutishauser]]===<br />
* '''e-mail''': [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 97<br />
* '''office''': ETZ J68.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Moritz_scherer.jpg|frameless|left|96px]]<br />
|<br />
<br />
===[[:User:Scheremo| Moritz Scherer]]===<br />
* '''e-mail''': [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 77 86<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Arpan_Suravi_Prasad.jpeg|frameless|left|96px]]<br />
|<br />
<br />
===[[:User:Prasadar| Arpan Suravi Prasad]]===<br />
* '''e-mail''': [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 44 91<br />
* '''office''': ETZ J89<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:jannis_schoenleber.jpg|frameless|left|96px]]<br />
|<br />
<br />
===[[:User:Janniss| Jannis Schönleber]]===<br />
* '''e-mail''': [mailto:janniss@iis.ee.ethz.ch janniss@iis.ee.ethz.ch]<br />
* '''phone''': TBD<br />
* '''office''': ETZ J76.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:gislamoglu.jpg|frameless|left|96px]]<br />
|<br />
<br />
===[[:User:Gislamoglu| Gamze İslamoğlu]]===<br />
* '''e-mail''': [mailto:gislamoglu@iis.ee.ethz.ch gislamoglu@iis.ee.ethz.ch]<br />
* '''office''': ETZ J78<br />
|}<br />
<br />
==Event-Driven Computing==<br />
With the increasing demand for "smart" algorithms on mobile and wearable devices, the energy cost of computing is becoming the bottleneck for battery lifetime. One approach to defuse this bottleneck is to reduce the compute activity on such devices - one of the most popular approaches uses sensor information to determine whether it is worth to run expensive computations or whether there is not enough activity in the environment. This approach is called event-driven computing.<br />
Event-driven architectures can be implemented for many applications - From pure sensing platforms to multi-core systems for machine learning on the edge.<br />
At IIS, we cover most of these applications. Besides working with novel, state-of-the-art sensors and sensing platforms to push the limits of lifetime of wearables and mobile devices, we also work with cutting-edge computing systems like Intel Loihi for Spiking Neural Networks to minimize the energy cost of machine intelligence.<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Adimauro.png|frameless|left|96px]]<br />
|<br />
===[[:User:Adimauro| Alfio Di Mauro]]===<br />
* '''e-mail''': [mailto:adimauro@iis.ee.ethz.ch adimauro@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 82 19<br />
* '''office''': ETZ J78<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Moritz_scherer.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Scheremo| Moritz Scherer]]===<br />
* '''e-mail''': [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 77 86<br />
* '''office''': ETZ J69.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Arpan_Suravi_Prasad.jpeg|frameless|left|96px]]<br />
|<br />
<br />
===[[:User:Prasadar| Arpan Suravi Prasad]]===<br />
* '''e-mail''': [mailto:prasadar@iis.ee.ethz.ch prasadar@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 44 91<br />
* '''office''': ETZ J89<br />
|}<br />
<br />
<br />
==On-Device Training== <br />
<br />
The fast development of the Internet-of Things (IoT) comes with the growing need for smart end-node devices able to execute Deep Learning networks locally. Processing the data on device has many advantages, not only drastically reducing the latency and communication energy cost, but also taking one step towards autonomous IoT end-nodes. Most of the current research efforts are focusing on inference, under the "train-then-deploy" paradigm. However, this results in a device unable to face real-life phenomena such as data distribution shifts or class increments. At IIS, we are actively researching new methods to tackle this significant challenge in the context of tightly memory constrained devices such as Microcontrollers (MCUs). <br />
<br />
<br />
{|<br />
| style="padding: 10px" | [[File:gianna.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Paulin| Gianna Paulin]]===<br />
* '''e-mail''': [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 60 80<br />
* '''office''': ETZ J76.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Cioflanc.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Cioflanc| Cristian Cioflan]]===<br />
* '''e-mail''': [mailto:cioflanc@iis.ee.ethz.ch cioflanc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 67 89 <br />
* '''office''': ETZ J89<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:viviane_potocnik.png|frameless|left|96px]]<br />
|<br />
===[[:User:Vivianep| Viviane Potocnik]]===<br />
* '''e-mail''': [mailto:vivianep@iis.ee.ethz.ch vivianep@iis.ee.ethz.ch]<br />
* '''phone''': TBD<br />
* '''office''': ETZ J78<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:victor_jung.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Jungvi| Victor Jung]]===<br />
* '''e-mail''': [mailto:jungvi@iis.ee.ethz.ch jungvi@iis.ee.ethz.ch]<br />
* '''phone''': TBD<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work but also to develop your technical skills.<br />
<br />
Only hard requirements:<br />
* '''Excitement for deep learning''' <br />
* For '''HW Design''' projects: '''VLSI 1, VLSI 2''' or equivalent<br />
<br />
==Tags==<br />
All our projects will be categorized into three categories. Therefore, look out for the following tags:<br />
* '''Algorithmic''' - you will mainly make algorithmic evaluations using languages and frameworks like e.g. Python, Pytorch, Tensorflow and our in-house frameworks like Quantlab, DORY, NEMO<br />
* '''Embedded Coding''' - you will implement e.g. c-code for one of our microcontrollers<br />
* '''HW Design''' - you will be designing HW including writing RTL, simulate, synthesize, and layout (backend) some HW<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
New projects are constantly being added, check back often! If you have any questions or would like to propose own ideas, do not hesitate to contact us!<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = Deep Learning Projects<br />
suppresserrors=true<br />
ordermethod=sortkey<br />
order=ascending<br />
</DynamicPageList><br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = Event-Driven Computing<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
==Projects in Progress==<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = Deep Learning Projects<br />
suppresserrors=true<br />
ordermethod=sortkey<br />
order=ascending<br />
</DynamicPageList><br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = Deep Learning Projects<br />
suppresserrors=true<br />
ordermethod=sortkey<br />
order=ascending<br />
</DynamicPageList></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=User:Fischeti&diff=9874User:Fischeti2023-11-03T15:00:37Z<p>Fischeti: </p>
<hr />
<div>[[File:Tim_Fischer.jpeg|thumb|right|280px]]<br />
== Tim Fischer ==<br />
I received my Bachelor's degree in Information Technology and Electrical Engineering from Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland in 2018 and my Master's degree in April 2020. After that, I started as a PhD Student in the digital circuits and systems group of Prof. Dr. L. Benini<br />
<br />
==Interests==<br />
My research focuses on interconnects for on-chip and off-chip communication in HPC Systems. Specifically, I am currently working on Network-on-Chips (NoCs) to enable scaling out to manycore systems. Further, I have also worked on a Die-to-Die link for chiplet-based systems. <br />
<br />
Previously, I also deployed Machine Learning Workloads on High-Performance Computing systems and worked on ML Hardware Accelerator for edge applications.<br />
<br />
<br />
<br />
==Contact Information==<br />
* '''Office''': OAT U21<br />
* '''e-mail''': [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
* '''www''': [https://iis.ee.ethz.ch/people/person-detail.MjE3MzI0.TGlzdC8zOTg3LDk5MDE4ODk4MA==.html IIS Homepage]<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In Progress<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Efficient_collective_communications_in_FlooNoC_(1M)&diff=9873Efficient collective communications in FlooNoC (1M)2023-11-03T10:48:45Z<p>Fischeti: </p>
<hr />
<div><!-- Efficient collective communications in FlooNoC (1M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Colluca]]<br />
[[Category:Fischeti]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Colluca | Luca Colagrande]]: [mailto:colluca@iis.ee.ethz.ch colluca@iis.ee.ethz.ch]<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
[[File:Floonoc_paper_fig4.png|thumb|600px|Figure 1: Physical implementation of FlooNoC connecting a mesh of compute tiles in GlobalFoundries’ 12 nm technology]]<br />
<br />
To realize the performance potential of many-core<br />
architectures, efficient and scalable on-chip communication is required [1]. Collective communication lies on the critical path for many applications; the criticality of such communication is evident in the dedicated<br />
collective and barrier networks employed in several supercomputers, such as Summit [2], NYU Ultracomputer, Cray T3D and Blue Gene/L. Likewise,<br />
many-core architectures would benefit from hardware support for collective communications but may not be able to afford separate, dedicated networks due to routing and area costs. For this reason, several papers in literature explored the integration of collective communication support directly into the existing NoC [1, 3].<br />
<br />
Collective communication operations are said to be "rooted" (or "asymmetric") when a specific node (the root) is either the sole origin (or producer) of data to be redistributed or the sole destination (or consumer) of data or results contributed by the nodes involved in the communication. Conversely, in "non-rooted" (or "symmetric") operations all nodes contribute and receive data. [6]<br />
<br />
Our focus is on rooted operations where the root node exchanges a single datum, i.e. multicast (when the root is a producer) and reduction (when the root is a consumer) operations.<br />
<br />
In two previous works, we explored the cost of integrating respectively multicast and reduction support directly into the interconnect of a shared-memory many-core system called Occamy [4]. In Occamy, 216+1 cores and their tightly-coupled data memories are interconnected by a hierarchy of AXI XBARs [5] in a partitioned global address space (PGAS). Each AXI XBAR interconnects a set of AXI masters with a set of AXI slaves, enabling unicast communication between any master and any slave. Our work involved extending the AXI XBAR to multicast transactions from an AXI master to multiple AXI slaves and to reduce data sent from multiple masters to a single slave.<br />
<br />
However, using an interconnect with a hierarchy of AXI XBARs is very limiting in scalability. AXI was not intended for large systems with a deep hierarchy, where master and slave communicate with each other over many "hops". Occamy is a good example that shows the limits of what can be achieved with an AXI interconnect (30% of the area was dedicated to the AXI interconnect). An actual Network-on-Chip solves that issues, which is why after the lessons learnt in Occamy, we started working on FlooNoC [7]. In this thesis, we want to extend FlooNoC with the same multicast and reduction operations of the previous works on the AXI XBAR<br />
<br />
= Project description =<br />
<br />
In this thesis you will implement hardware support for multicast and reduction operations in FlooNoC, an open-source versatile AXI-based network-on-chip architecture which was developed in our group.<br />
You will evaluate the PPA impact of your extension and benchmark it on some common computational workloads (e.g. GEMM).<br />
<br />
== Detailed task description ==<br />
<br />
To break it down in more detail, you will:<br />
<br />
* '''Review previous work''':<br />
** Literature review on collective communication support in NoCs<br />
** Familiarize with the multicast and reduction extensions for the AXI XBAR<br />
* '''Implement support for multicast and reduction operations in FlooNoC:'''<br />
** Extend the testbench and verification infrastructure to verify the design<br />
** Plan and carry out RTL modifications<br />
** Explore PPA overheads and correlate them with your RTL changes<br />
** Iterate and improve PPA of the design<br />
* '''System integration and evaluation:'''<br />
** Extend the FlooNoC generator to support your extensions<br />
** Extend our FlooNoC-based Occamy system-on-chip to support your extensions<br />
** Extend our implementations of several common parallel programming primitives (e.g. barrier) and computational workloads (e.g GEMM) to use your extensions<br />
** Evaluate the performance gains from your extensions in the FlooNoC-based Occamy system-on-chip<br />
** Compare the scaling behaviour of the kernels (w/ and w/o your extensions) with the size of the system<br />
<br />
== Stretch goals ==<br />
<br />
Additional optional stretch goals may include:<br />
<br />
* Evaluate your extensions on synthetic traffic patterns<br />
* Extend our OpenMP runtime to make use of your extensions<br />
* Measure how your extension improves the overall runtime of real-world OpenMP workloads<br />
<br />
== Character ==<br />
<br />
* 10% Literature/architecture review<br />
* 50% RTL design and verification<br />
* 30% Physical design exploration<br />
* 10% Bare-metal software development<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow as taught in VLSI II<br />
* Preferred: Experience in bare-metal or embedded C programming<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6168953 Supporting Efficient Collective Communication in NoCs] <br /><br />
[2] [https://ieeexplore.ieee.org/document/8961159 The high-speed networks of the Summit and Sierra supercomputers] <br /><br />
[3] [https://dl.acm.org/doi/10.1145/2155620.2155630 Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication] <br /><br />
[4] [https://pulp-platform.org/occamy/ Occamy many-core chiplet system] <br /><br />
[5] [https://github.com/pulp-platform/axi/blob/master/doc/axi_xbar.md PULP platform's AXI XBAR IP documentation] <br /><br />
[6] [https://link.springer.com/referenceworkentry/10.1007/978-0-387-09766-4_28 Encyclopedia of Parallel Computing: Collective Communication entry] <br /><br />
[7] [https://github.com/pulp-platform/FlooNoC FlooNoC: A Fast, Low-Overhead On-chip Network] <br /></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Efficient_collective_communications_in_FlooNoC_(1M)&diff=9872Efficient collective communications in FlooNoC (1M)2023-11-03T10:48:24Z<p>Fischeti: </p>
<hr />
<div><!-- Efficient collective communications in FlooNoC (1M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Colluca]]<br />
[[Category:Fischeti]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Colluca | Luca Colagrande]]: [mailto:colluca@iis.ee.ethz.ch colluca@iis.ee.ethz.ch]<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
[[File:Floonoc_paper_fig4.png|thumb|600px|Figure 1: Physical implementation of FlooNoC connecting a mesh of compute tiles in GlobalFoundries’ 12 nm technology]]<br />
<br />
To realize the performance potential of many-core<br />
architectures, efficient and scalable on-chip communication is required [1]. Collective communication lies on the critical path for many applications; the criticality of such communication is evident in the dedicated<br />
collective and barrier networks employed in several supercomputers, such as Summit [2], NYU Ultracomputer, Cray T3D and Blue Gene/L. Likewise,<br />
many-core architectures would benefit from hardware support for collective communications but may not be able to afford separate, dedicated networks due to routing and area costs. For this reason, several papers in literature explored the integration of collective communication support directly into the existing NoC [1, 3].<br />
<br />
Collective communication operations are said to be "rooted" (or "asymmetric") when a specific node (the root) is either the sole origin (or producer) of data to be redistributed or the sole destination (or consumer) of data or results contributed by the nodes involved in the communication. Conversely, in "non-rooted" (or "symmetric") operations all nodes contribute and receive data. [6]<br />
<br />
Our focus is on rooted operations where the root node exchanges a single datum, i.e. multicast (when the root is a producer) and reduction (when the root is a consumer) operations.<br />
<br />
In two previous works, we explored the cost of integrating respectively multicast and reduction support directly into the interconnect of a shared-memory many-core system called Occamy [4]. In Occamy, 216+1 cores and their tightly-coupled data memories are interconnected by a hierarchy of AXI XBARs [5] in a partitioned global address space (PGAS). Each AXI XBAR interconnects a set of AXI masters with a set of AXI slaves, enabling unicast communication between any master and any slave. Our work involved extending the AXI XBAR to multicast transactions from an AXI master to multiple AXI slaves and to reduce data sent from multiple masters to a single slave.<br />
<br />
However, using an interconnect with a hierarchy of AXI XBARs is very limiting in scalability. AXI was not intended for large systems with a deep hierarchy, where master and slave communicate with each other over many "hops". Occamy is a good example that shows the limits of what can be achieved with an AXI interconnect (30% of the area was dedicated to the AXI interconnect). An actual Network-on-Chip solves that issues, which is why after the lessons learnt in Occamy, we started working on FlooNoC [6]. In this thesis, we want to extend FlooNoC with the same multicast and reduction operations of the previous works on the AXI XBAR<br />
<br />
= Project description =<br />
<br />
In this thesis you will implement hardware support for multicast and reduction operations in FlooNoC, an open-source versatile AXI-based network-on-chip architecture which was developed in our group.<br />
You will evaluate the PPA impact of your extension and benchmark it on some common computational workloads (e.g. GEMM).<br />
<br />
== Detailed task description ==<br />
<br />
To break it down in more detail, you will:<br />
<br />
* '''Review previous work''':<br />
** Literature review on collective communication support in NoCs<br />
** Familiarize with the multicast and reduction extensions for the AXI XBAR<br />
* '''Implement support for multicast and reduction operations in FlooNoC:'''<br />
** Extend the testbench and verification infrastructure to verify the design<br />
** Plan and carry out RTL modifications<br />
** Explore PPA overheads and correlate them with your RTL changes<br />
** Iterate and improve PPA of the design<br />
* '''System integration and evaluation:'''<br />
** Extend the FlooNoC generator to support your extensions<br />
** Extend our FlooNoC-based Occamy system-on-chip to support your extensions<br />
** Extend our implementations of several common parallel programming primitives (e.g. barrier) and computational workloads (e.g GEMM) to use your extensions<br />
** Evaluate the performance gains from your extensions in the FlooNoC-based Occamy system-on-chip<br />
** Compare the scaling behaviour of the kernels (w/ and w/o your extensions) with the size of the system<br />
<br />
== Stretch goals ==<br />
<br />
Additional optional stretch goals may include:<br />
<br />
* Evaluate your extensions on synthetic traffic patterns<br />
* Extend our OpenMP runtime to make use of your extensions<br />
* Measure how your extension improves the overall runtime of real-world OpenMP workloads<br />
<br />
== Character ==<br />
<br />
* 10% Literature/architecture review<br />
* 50% RTL design and verification<br />
* 30% Physical design exploration<br />
* 10% Bare-metal software development<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow as taught in VLSI II<br />
* Preferred: Experience in bare-metal or embedded C programming<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6168953 Supporting Efficient Collective Communication in NoCs] <br /><br />
[2] [https://ieeexplore.ieee.org/document/8961159 The high-speed networks of the Summit and Sierra supercomputers] <br /><br />
[3] [https://dl.acm.org/doi/10.1145/2155620.2155630 Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication] <br /><br />
[4] [https://pulp-platform.org/occamy/ Occamy many-core chiplet system] <br /><br />
[5] [https://github.com/pulp-platform/axi/blob/master/doc/axi_xbar.md PULP platform's AXI XBAR IP documentation] <br /><br />
[6] [https://link.springer.com/referenceworkentry/10.1007/978-0-387-09766-4_28 Encyclopedia of Parallel Computing: Collective Communication entry] <br /><br />
[7] [https://github.com/pulp-platform/FlooNoC FlooNoC: A Fast, Low-Overhead On-chip Network] <br /></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Efficient_collective_communications_in_FlooNoC_(1M)&diff=9871Efficient collective communications in FlooNoC (1M)2023-11-03T10:46:37Z<p>Fischeti: /* Introduction */</p>
<hr />
<div><!-- Efficient collective communications in FlooNoC (1M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Colluca]]<br />
[[Category:Fischeti]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Colluca | Luca Colagrande]]: [mailto:colluca@iis.ee.ethz.ch colluca@iis.ee.ethz.ch]<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
[[File:Floonoc_paper_fig4.png|thumb|600px|Figure 1: Physical implementation of FlooNoC connecting a mesh of compute tiles in GlobalFoundries’ 12 nm technology]]<br />
<br />
To realize the performance potential of many-core<br />
architectures, efficient and scalable on-chip communication is required [1]. Collective communication lies on the critical path for many applications; the criticality of such communication is evident in the dedicated<br />
collective and barrier networks employed in several supercomputers, such as Summit [2], NYU Ultracomputer, Cray T3D and Blue Gene/L. Likewise,<br />
many-core architectures would benefit from hardware support for collective communications but may not be able to afford separate, dedicated networks due to routing and area costs. For this reason, several papers in literature explored the integration of collective communication support directly into the existing NoC [1, 3].<br />
<br />
Collective communication operations are said to be "rooted" (or "asymmetric") when a specific node (the root) is either the sole origin (or producer) of data to be redistributed or the sole destination (or consumer) of data or results contributed by the nodes involved in the communication. Conversely, in "non-rooted" (or "symmetric") operations all nodes contribute and receive data. [6]<br />
<br />
Our focus is on rooted operations where the root node exchanges a single datum, i.e. multicast (when the root is a producer) and reduction (when the root is a consumer) operations.<br />
<br />
In two previous works, we explored the cost of integrating respectively multicast and reduction support directly into the interconnect of a shared-memory many-core system called Occamy [4]. In Occamy, 216+1 cores and their tightly-coupled data memories are interconnected by a hierarchy of AXI XBARs [5] in a partitioned global address space (PGAS). Each AXI XBAR interconnects a set of AXI masters with a set of AXI slaves, enabling unicast communication between any master and any slave. Our work involved extending the AXI XBAR to multicast transactions from an AXI master to multiple AXI slaves and to reduce data sent from multiple masters to a single slave.<br />
<br />
However, using an interconnect with a hierarchy of AXI XBARs is very limiting in scalability. AXI was not intended for large systems with a deep hierarchy, where master and slave communicate with each other over many "hops". Occamy is a good example that shows the limits of what can be achieved with an AXI interconnect (30% of the area was dedicated to the AXI interconnect). An actual Network-on-Chip solves that issues, which is why after the lessons learnt in Occamy, we started working on FlooNoC. In this thesis, we want to extend FlooNoC with the same multicast and reduction operations of the previous works on the AXI XBAR<br />
<br />
= Project description =<br />
<br />
In this thesis you will implement hardware support for multicast and reduction operations in FlooNoC, an open-source versatile AXI-based network-on-chip architecture which was developed in our group.<br />
You will evaluate the PPA impact of your extension and benchmark it on some common computational workloads (e.g. GEMM).<br />
<br />
== Detailed task description ==<br />
<br />
To break it down in more detail, you will:<br />
<br />
* '''Review previous work''':<br />
** Literature review on collective communication support in NoCs<br />
** Familiarize with the multicast and reduction extensions for the AXI XBAR<br />
* '''Implement support for multicast and reduction operations in FlooNoC:'''<br />
** Extend the testbench and verification infrastructure to verify the design<br />
** Plan and carry out RTL modifications<br />
** Explore PPA overheads and correlate them with your RTL changes<br />
** Iterate and improve PPA of the design<br />
* '''System integration and evaluation:'''<br />
** Extend the FlooNoC generator to support your extensions<br />
** Extend our FlooNoC-based Occamy system-on-chip to support your extensions<br />
** Extend our implementations of several common parallel programming primitives (e.g. barrier) and computational workloads (e.g GEMM) to use your extensions<br />
** Evaluate the performance gains from your extensions in the FlooNoC-based Occamy system-on-chip<br />
** Compare the scaling behaviour of the kernels (w/ and w/o your extensions) with the size of the system<br />
<br />
== Stretch goals ==<br />
<br />
Additional optional stretch goals may include:<br />
<br />
* Evaluate your extensions on synthetic traffic patterns<br />
* Extend our OpenMP runtime to make use of your extensions<br />
* Measure how your extension improves the overall runtime of real-world OpenMP workloads<br />
<br />
== Character ==<br />
<br />
* 10% Literature/architecture review<br />
* 50% RTL design and verification<br />
* 30% Physical design exploration<br />
* 10% Bare-metal software development<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow as taught in VLSI II<br />
* Preferred: Experience in bare-metal or embedded C programming<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6168953 Supporting Efficient Collective Communication in NoCs] <br /><br />
[2] [https://ieeexplore.ieee.org/document/8961159 The high-speed networks of the Summit and Sierra supercomputers] <br /><br />
[3] [https://dl.acm.org/doi/10.1145/2155620.2155630 Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication] <br /><br />
[4] [https://pulp-platform.org/occamy/ Occamy many-core chiplet system] <br /><br />
[5] [https://github.com/pulp-platform/axi/blob/master/doc/axi_xbar.md PULP platform's AXI XBAR IP documentation] <br /><br />
[6] [https://link.springer.com/referenceworkentry/10.1007/978-0-387-09766-4_28 Encyclopedia of Parallel Computing: Collective Communication entry] <br /></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Routing_1000s_of_wires_in_Network-on-Chips_(1-2S/M)&diff=9747Routing 1000s of wires in Network-on-Chips (1-2S/M)2023-10-24T13:46:46Z<p>Fischeti: /* Project */</p>
<hr />
<div><!-- Backend explorations for Network-on-Chips (1-2S/M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Network-on-Chip]]<br />
[[Category:Interconnect]]<br />
[[Category:Backend]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Yiczhang]]<br />
[[Category:Available]]<br />
<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User: Yiczhang | Yichao Zhang]]: [mailto:yiczhang@iis.ee.ethz.ch yiczhang@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The bandwidth requirements in modern HPC systems pose a serious challenge for the design of the next generation of Network-on-Chips (NoCs). On the other hand, modern technologies have provided us with a lot of possibilities on how to approach the implementation of NoCs. For instance, modern technologies have >10 metal layers, which provide a lot of routing resources (>50k wires/mm) to route the links of a NoC. Another example is the placement of SRAM macros, which usually consume a lot of the area, but if handled smartly, the wires of a Network-on-Chip can be routed over the SRAM macros.<br />
<br />
In our research group we implemented our own Network-on-Chip called ''FlooNoC'' [1][2], which was designed with awareness of those physical implementation effects and hence uses very wide and multiple separate physical channels. While we have done some initial backend explorations, there are a lot of open questions that can be explored in this thesis.<br />
<br />
= Project =<br />
<br />
This thesis aims to go deeper into the backend design and optimization of Network-on-Chips. The specific areas of exploration can include:<br />
<br />
'''Router Symbiosis''': Should the routers be hardened as a macro or flattened into the top-level/tile? There are a lot of reasons that speak for and against each option. For instance, flattening the routers is easier for the backend designer, but you lose control over how the routing is done by the EDA tools, and it is possibly not optimal.<br />
<br />
'''Routing channel length and density''': How long can the wires maximally be, such that the timing is still met? And how close can wires be routed to each other without sacrificing signal integrity?<br />
<br />
'''Gas Stations''': If the wires are too long, the paths are usually buffered with so-called gas stations to bridge the larger distances. This can be done either by the designer manually or by the EDA tool automatically. What is the best strategy to do this? i.e., how many buffers should be placed for a certain distance and which driving strengths are needed for the best timing and power consumption?<br />
<br />
'''Performance and Energy''': How can we maximize the frequency of the NoC while keeping the power consumption in check? Is it better to increase the width of the channels or to increase the frequency of the links?<br />
<br />
== Character ==<br />
<br />
* 10% Literature Research<br />
* 60% Architecture Design and Exploration<br />
* 20% Simulation and Evaluation<br />
* 10% Documentation & Report<br />
<br />
== Prerequisites ==<br />
<br />
* The student should have attended the VLSI 2 course.<br />
* Previous backend experience (i.e. a chip tapeout) is advantageous but not required<br />
* Some familiarity with the concept of Network-on-Chips is recommended <br />
<br />
== References ==<br />
* [1] https://ieeexplore.ieee.org/document/10225380<br />
* [2] https://github.com/pulp-platform/FlooNoC/tree/main</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Routing_1000s_of_wires_in_Network-on-Chips_(1-2S/M)&diff=9746Routing 1000s of wires in Network-on-Chips (1-2S/M)2023-10-24T13:45:57Z<p>Fischeti: Created page with "<!-- Backend explorations for Network-on-Chips (1-2S/M) --> Category:Digital Category:Network-on-Chip Category:Interconnect Category:Backend Category:2023..."</p>
<hr />
<div><!-- Backend explorations for Network-on-Chips (1-2S/M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Network-on-Chip]]<br />
[[Category:Interconnect]]<br />
[[Category:Backend]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Yiczhang]]<br />
[[Category:Available]]<br />
<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User: Yiczhang | Yichao Zhang]]: [mailto:yiczhang@iis.ee.ethz.ch yiczhang@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The bandwidth requirements in modern HPC systems pose a serious challenge for the design of the next generation of Network-on-Chips (NoCs). On the other hand, modern technologies have provided us with a lot of possibilities on how to approach the implementation of NoCs. For instance, modern technologies have >10 metal layers, which provide a lot of routing resources (>50k wires/mm) to route the links of a NoC. Another example is the placement of SRAM macros, which usually consume a lot of the area, but if handled smartly, the wires of a Network-on-Chip can be routed over the SRAM macros.<br />
<br />
In our research group we implemented our own Network-on-Chip called ''FlooNoC'' [1][2], which was designed with awareness of those physical implementation effects and hence uses very wide and multiple separate physical channels. While we have done some initial backend explorations, there are a lot of open questions that can be explored in this thesis.<br />
<br />
= Project =<br />
<br />
This thesis aims to go deeper into the backend design and optimization of Network-on-Chips. The specific areas of exploration can include:<br />
<br />
'''Router Symbiosis''': Should the routers be hardened as a macro or flattened into the top-level/tile? There are a lot of reasons that speak for and against each option. For instance, flattening the routers is easier for the backend designer, but you lose control over how the routing is done by the EDA tools, and it is possibly not optimal.<br />
<br />
'''Routing channel length and density''': How long can the wires maximally be, such that the timing is still met? And how close can wires be routed to each other without sacrificing signal integrity?<br />
<br />
'''Gas Stations''': If the wires are too long, the paths are usually buffered with so-called gas stations to bridge the larger distances. This can be done either by the designer manually or by the EDA tool automatically. What is the best strategy to do this? i.e., how many buffers should be placed for a certain distance and which driving strengths are needed for the best timing and power consumption?<br />
<br />
''Performance and Energy'': How can we maximize the frequency of the NoC while keeping the power consumption in check? Is it better to increase the width of the channels or to increase the frequency of the links?<br />
<br />
== Character ==<br />
<br />
* 10% Literature Research<br />
* 60% Architecture Design and Exploration<br />
* 20% Simulation and Evaluation<br />
* 10% Documentation & Report<br />
<br />
== Prerequisites ==<br />
<br />
* The student should have attended the VLSI 2 course.<br />
* Previous backend experience (i.e. a chip tapeout) is advantageous but not required<br />
* Some familiarity with the concept of Network-on-Chips is recommended <br />
<br />
== References ==<br />
* [1] https://ieeexplore.ieee.org/document/10225380<br />
* [2] https://github.com/pulp-platform/FlooNoC/tree/main</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Network-on-Chip_for_coherent_and_non-coherent_traffic_(M)&diff=9745Network-on-Chip for coherent and non-coherent traffic (M)2023-10-24T12:29:33Z<p>Fischeti: </p>
<hr />
<div><!-- Network-on-Chip for coherent and non-coherent traffic (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Network-on-Chip]]<br />
[[Category:Interconnect]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Available]]<br />
<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
With the continuous growth in the number of cores in many-core architectures, effective communication between the cores has become very important. Network on Chip (NoC) designs, functioning as an interconnection backbone, are pivotal in ensuring efficient communication. While we have already done a lot of work on a NoCs called 'FlooNoC' [1][2] for non-coherent traffic using the AXI protocol [3][4], we are now also looking into supporting coherent traffic, particularly in shared memory systems.<br />
<br />
As established by ARM, the Coherent Hub Interface (CHI) standard [5] provides a protocol for such coherent interconnects. Combining coherent traffic (as per the CHI standard) with non-coherent traffic in a single NoC system presents a valuable yet challenging endeavor. This thesis aims to investigate and develop an integrated NoC design to accommodate both traffic types.<br />
<br />
= Project =<br />
<br />
The goals of this thesis are as follows:<br />
<br />
1. To understand the specific requirements and characteristics of coherent traffic as defined by the CHI standard.<br />
<br />
2. To analyze the existing NoC designs for non-coherent traffic using the AXI protocol and identify the primary bottlenecks when introducing coherent traffic.<br />
<br />
3. To develop an architectural design for an integrated NoC that can seamlessly support both coherent and non-coherent traffic.<br />
<br />
4. To implement a prototype of the proposed design and benchmark its performance in various scenarios.<br />
<br />
== Character ==<br />
<br />
* 20% Literature Research<br />
* 50% Architecture Design and Exploration<br />
* 20% Performance<br />
* 10% Documentation & Report<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Knowledge about Coherency Protocol (e.g. MESI) is recommended<br />
* Knowledge about non-coherent on-chip protocols (e.g. AXI4) is recommended<br />
<br />
== References ==<br />
* [1] https://ieeexplore.ieee.org/document/10225380<br />
* [2] https://github.com/pulp-platform/FlooNoC/tree/main<br />
* [3] https://developer.arm.com/documentation/102202/0300/AXI-protocol-overview<br />
* [4] https://github.com/pulp-platform/axi<br />
* [5] https://developer.arm.com/documentation/102407/0100/CHI-protocol-fundamentals</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Network-off-Chip_(M)&diff=9744Network-off-Chip (M)2023-10-24T12:27:25Z<p>Fischeti: </p>
<hr />
<div><!-- Network-off-Chip (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Network-on-Chip]]<br />
[[Category:Interconnect]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Completed]]<br />
<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Type: Semester Thesis or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User:Chizhang | Chi Zhang]]: [mailto:chizhang@iis.ee.ethz.ch chizhang@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
As the demand for High-Performance Computing (HPC) systems continues to increase, traditional on-chip communication networks, such as bus-based and point-to-point interconnects, are becoming inefficient and limiting the scalability of these systems. Network-on-Chip (NoC) architectures have emerged as a promising solution to this problem, providing a scalable and flexible communication infrastructure for HPC systems. However, off-chip communication remains a bottleneck for many NoC architectures. This thesis proposes a Network-off-chip architecture that combines a NoC with an off-chip serial link to overcome the limitations of traditional NoC architectures.<br />
<br />
= Project =<br />
<br />
In this project, you will combine the FlooNoC[1][2] with the Serial Link[3] to end up with an interconnect that can bridge both on-chip and off-chip communication. This will require rethinking and redesigning our existing IPs for the FlooNoC and the Serial Link.<br />
<br />
''Protocol conversion''<br />
<br />
First, the protocol interface of the Serial Link and the ''FlooNoC'' is not compatible yet. The Serial Link has an AXI4-Interface [4], while the FlooNoC has a generic protocol that wraps AXI4 requests and responses. Theoretically, unwrapping the generic NoC protocol to AXI with a NI would solve the problem but cause an overhead that is not desirable. Instead, a module that converts from the generic NoC protocol to AXI-Stream should replace the current protocol layer module.<br />
<br />
''Physical vs. Virtual Channels''<br />
<br />
The design FlooNoC is based on wide physical channels since on-chip routing resources are plentiful. Wire pitches for on-chip routing are in the order of nm, compared to μBump pitches which are tens of μm. Hence, going off-chip requires some form of serialization. Serializer/Deserializer (SERDES) is very common in traditional NoCs, in the form of wide messages that are serialized to multiple narrow messages. Further, virtual channels also serialize multiple channels onto one physical channel to save wires. While physical channels are replacing virtual channels in modern technologies, virtual channels are still needed to go off-chip. One part of this thesis will be to define and implement a bridge from multiple wide physical channels to narrow virtual channels that can be sent off-chip while achieving low latency and high throughput and energy efficiency.<br />
<br />
''Narrow-Wide communication''<br />
<br />
Communication in modern SoCs can be quite diverse regarding the interconnect requirements. Direct Memory Accesss (DMAs) for instance, require an interconnect that can satisfy a high sustained bandwidth. They are also more latency-tolerant if they can issue multiple outstanding transactions (see AXI-protocol), data can be transferred in bursts and double-buffered such that PEs can be kept busy during data transfers. on the other hand messages issued by PEs are usually more latency-sensitive (e.g., synchronization). This is why FlooNoC implements multiple physical channels with different widths for wide, high-bandwidth and narrow latency-sensitive traffic. The physical2virtual channel bridge must also arbitrate between narrow and wide while preventing starvation and deadlocks and guaranteeing sustained high-bandwidth and low-latency transfers.<br />
<br />
== Character ==<br />
<br />
* 20% Literature Research<br />
* 50% Architecture Design and Exploration<br />
* 20% Performance evaluation<br />
* 10% Documentation & Report<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Knowledge about non-coherent on-chip protocols (e.g. AXI4) is recommended<br />
<br />
== References ==<br />
* [1] https://ieeexplore.ieee.org/document/10225380<br />
* [2] https://github.com/pulp-platform/FlooNoC/tree/main<br />
* [3] https://github.com/pulp-platform/serial_link<br />
* [4] https://github.com/pulp-platform/axi</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Network-off-Chip_(M)&diff=9743Network-off-Chip (M)2023-10-24T12:27:04Z<p>Fischeti: Created page with "<!-- Network-off-Chip (M) --> Category:Digital Category:Network-on-Chip Category:Interconnect Category:2023 Category:Master Thesis Category:Fischeti [..."</p>
<hr />
<div><!-- Network-off-Chip (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Network-on-Chip]]<br />
[[Category:Interconnect]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Completed]]<br />
<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Type: Semester Thesis or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User:Chizhang | Chi Zhang]]: [mailto:chizhang@iis.ee.ethz.ch chizhang@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
As the demand for High-Performance Computing (HPC) systems continues to increase, traditional on-chip communication networks, such as bus-based and point-to-point interconnects, are becoming inefficient and limiting the scalability of these systems. Network-on-Chip (NoC) architectures have emerged as a promising solution to this problem, providing a scalable and flexible communication infrastructure for HPC systems. However, off-chip communication remains a bottleneck for many NoC architectures. This thesis proposes a Network-off-chip architecture that combines a NoC with an off-chip serial link to overcome the limitations of traditional NoC architectures.<br />
<br />
= Project =<br />
<br />
In this project, you will combine the FlooNoC[1][2] with the Serial Link[3] to end up with an interconnect that can bridge both on-chip and off-chip communication. This will require rethinking and redesigning our existing IPs for the FlooNoC and the Serial Link.<br />
<br />
''Protocol conversion''<br />
First, the protocol interface of the Serial Link and the ''FlooNoC'' is not compatible yet. The Serial Link has an AXI4-Interface [4], while the FlooNoC has a generic protocol that wraps AXI4 requests and responses. Theoretically, unwrapping the generic NoC protocol to AXI with a NI would solve the problem but cause an overhead that is not desirable. Instead, a module that converts from the generic NoC protocol to AXI-Stream should replace the current protocol layer module.<br />
<br />
''Physical vs. Virtual Channels''<br />
The design FlooNoC is based on wide physical channels since on-chip routing resources are plentiful. Wire pitches for on-chip routing are in the order of nm, compared to μBump pitches which are tens of μm. Hence, going off-chip requires some form of serialization. Serializer/Deserializer (SERDES) is very common in traditional NoCs, in the form of wide messages that are serialized to multiple narrow messages. Further, virtual channels also serialize multiple channels onto one physical channel to save wires. While physical channels are replacing virtual channels in modern technologies, virtual channels are still needed to go off-chip. One part of this thesis will be to define and implement a bridge from multiple wide physical channels to narrow virtual channels that can be sent off-chip while achieving low latency and high throughput and energy efficiency.<br />
<br />
''Narrow-Wide communication''<br />
Communication in modern SoCs can be quite diverse regarding the interconnect requirements. Direct Memory Accesss (DMAs) for instance, require an interconnect that can satisfy a high sustained bandwidth. They are also more latency-tolerant if they can issue multiple outstanding transactions (see AXI-protocol), data can be transferred in bursts and double-buffered such that PEs can be kept busy during data transfers. on the other hand messages issued by PEs are usually more latency-sensitive (e.g., synchronization). This is why FlooNoC implements multiple physical channels with different widths for wide, high-bandwidth and narrow latency-sensitive traffic. The physical2virtual channel bridge must also arbitrate between narrow and wide while preventing starvation and deadlocks and guaranteeing sustained high-bandwidth and low-latency transfers.<br />
<br />
== Character ==<br />
<br />
* 20% Literature Research<br />
* 50% Architecture Design and Exploration<br />
* 20% Performance evaluation<br />
* 10% Documentation & Report<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Knowledge about non-coherent on-chip protocols (e.g. AXI4) is recommended<br />
<br />
== References ==<br />
* [1] https://ieeexplore.ieee.org/document/10225380<br />
* [2] https://github.com/pulp-platform/FlooNoC/tree/main<br />
* [3] https://github.com/pulp-platform/serial_link<br />
* [4] https://github.com/pulp-platform/axi</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Network-on-Chip_for_coherent_and_non-coherent_traffic_(M)&diff=9739Network-on-Chip for coherent and non-coherent traffic (M)2023-10-24T12:18:45Z<p>Fischeti: Created page with "<!-- Network-on-Chip for coherent and non-coherent traffic (M) --> Category:Digital Category:Network-on-Chip Category:Interconnect Category:2023 Category:Ma..."</p>
<hr />
<div><!-- Network-on-Chip for coherent and non-coherent traffic (M) --><br />
<br />
[[Category:Digital]]<br />
[[Category:Network-on-Chip]]<br />
[[Category:Interconnect]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Available]]<br />
<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
With the continuous growth in the number of cores in many-core architectures, effective communication between the cores has become very important. Network on Chip (NoC) designs, functioning as an interconnection backbone, are pivotal in ensuring efficient communication. While we have already done a lot of work on a NoCs called 'FlooNoC' [1][2] for non-coherent traffic using the AXI protocol [3][4], we are now also looking into supporting coherent traffic, particularly in shared memory systems.<br />
<br />
As established by ARM, the Coherent Hub Interface (CHI) standard [5] provides a protocol for such coherent interconnects. Combining coherent traffic (as per the CHI standard) with non-coherent traffic in a single NoC system presents a valuable yet challenging endeavor. This thesis aims to investigate and develop an integrated NoC design to accommodate both traffic types.<br />
<br />
= Project =<br />
<br />
The goals of this thesis are as follows:<br />
<br />
1. To understand the specific requirements and characteristics of coherent traffic as defined by the CHI standard.<br />
2. To analyze the existing NoC designs for non-coherent traffic using the AXI protocol and identify the primary bottlenecks when introducing coherent traffic.<br />
3. To develop an architectural design for an integrated NoC that can seamlessly support both coherent and non-coherent traffic.<br />
4. To implement a prototype of the proposed design and benchmark its performance in various scenarios.<br />
<br />
== Character ==<br />
<br />
* 20% Literature Research<br />
* 50% Architecture Design and Exploration<br />
* 20% Performance<br />
* 10% Documentation & Report<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Knowledge about Coherency Protocol (e.g. MESI) is recommended<br />
* Knowledge about non-coherent on-chip protocols (e.g. AXI4) is recommended<br />
<br />
== References ==<br />
* [1] https://ieeexplore.ieee.org/document/10225380<br />
* [2] https://github.com/pulp-platform/FlooNoC/tree/main<br />
* [3] https://developer.arm.com/documentation/102202/0300/AXI-protocol-overview<br />
* [4] https://github.com/pulp-platform/axi<br />
* [5] https://developer.arm.com/documentation/102407/0100/CHI-protocol-fundamentals</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=User:Fischeti&diff=9724User:Fischeti2023-10-24T11:44:10Z<p>Fischeti: </p>
<hr />
<div>[[File:Tim_Fischer.jpeg|thumb|right|280px]]<br />
== Tim Fischer ==<br />
I received my Bachelor's degree in Information Technology and Electrical Engineering from Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland in 2018 and my Master's degree in April 2020. After that, I started as a PhD Student in the digital circuits and systems group of Prof. Dr. L. Benini<br />
<br />
==Interests==<br />
My research focuses on interconnects for on-chip and off-chip communication in HPC Systems. Specifically, I am currently working on Network-on-Chips (NoCs) to enable scaling out to manycore systems. Further, I have also worked on a Die-to-Die link for chiplet-based systems. <br />
<br />
Previously, I also deployed Machine Learning Workloads on High-Performance Computing systems and worked on ML Hardware Accelerator for edge applications.<br />
<br />
<br />
<br />
==Contact Information==<br />
* '''Office''': OAT U21<br />
* '''e-mail''': [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
* '''www''': [https://iis.ee.ethz.ch/people/person-detail.MjE3MzI0.TGlzdC8zOTg3LDk5MDE4ODk4MA==.html IIS Homepage]<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In Progress<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=MCU_Bus_and_Memory_Generator:_implementation_of_a_Highly_Configurable_bus_and_memory_subsystem_for_a_RISC-V_based_microcontroller.&diff=9685MCU Bus and Memory Generator: implementation of a Highly Configurable bus and memory subsystem for a RISC-V based microcontroller.2023-10-23T12:44:06Z<p>Fischeti: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:ASIC]]<br />
[[Category:Computer and System Architecture]]<br />
[[Category:Hot]]<br />
[[Category:High Performance Computing]]<br />
[[Category:FPGA & Digital ASIC Design]]<br />
[[Category:Energy Efficient SoCs]]<br />
[[Category:2022]]<br />
[[Category:Master Thesis]]<br />
[[Category:Pschiavo]]<br />
[[Category:Meggiman]]<br />
[[Category:Fischeti]]<br />
[[Category:Michaero]]<br />
[[Category:Completed]]<br />
<br />
<br />
<br />
== Status: Completed ==<br />
<br />
* Type: Semester or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Pschiavo | Pasquale Davide Schiavone]]<br />
** [[:User:Meggiman | Manuel Eggimann]] : [mailto:meggimann@iis.ee.ethz.ch meggimann@iis.ee.ethz.ch]<br />
** [[:User:Fischeti | Tim Fischer]] : [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User:Michaero | Michael Rogenmoser]] : [mailto:michaero@iis.ee.ethz.ch michaero@iis.ee.ethz.ch]<br />
<br />
<br />
==Introduction==<br />
<br />
<br />
Microcontrollers (MCUs) are used in a wide range of applications ranging from sensor-monitoring all the way to robotics. Despite typically lower in performance, they are usually preferred over custom circuits and FPGAs thanks to their versatility and easy programmability via software routines typically written in the C language. <br />
STMicroelectronics, SiLabs, NXP, Raspberry, and Arduino are probably the most used ones. These vendors provide different flavors of microcontrollers that optimize different angles, such as power, cost, performance, functionalities, and software stack, which is typically either based on real-time operating systems. To be able to customize the computer architecture of those microcontrollers, if possible at all, the users must buy expensive licenses from the providers for the rights of customizing and selling their own IPs. <br />
Differently, the Linux revolution made the software business wider and more competitive as much of the software is open source and free to be customized.<br />
Linux, Android, freeRTOS, GCC, and LLVM are just a few examples of the most used software in the microcontroller markets<br />
However, in recent years, open-source hardware is becoming a key trend that is changing how the chip industry works. <br />
It all started with the open source instruction set architecture RISC-V, which was followed by many IPs described in Verilog or VHDL on Github or other servers, all the way to the foundation of worldwide organizations that maintain, develop, and verify industrial-grade open source processors and microcontrollers as for example the OpenHW Group and LowRISC, whose IPs are used in industrial-products freely, without the need of paying any fee or license. <br />
Switzerland, thanks to ETH Zurich, plays a big role in this picture as one of the main providers of those IPs coming from the PULP project that eventually end up first being verified by the aforementioned foundations, then being implemented in products. <br />
Also, the Embedded Systems Laboratory (ESL) at the Ecole Polytechnique Fédérale de Lausanne (EPFL) is developing an open-source RISC-V-based microcontroller called x-heep, which shares most of the IPs with the ETH's PULP system.<br />
<br />
==Project description==<br />
<br />
Our idea is to provide high-level customization to our RISC-V MCUs (pulp and x-heep) by providing tools to configure the MCU to the users' needs, in a friendly way, making it de-facto a microcontroller generator. In particular, we want to make the bus and the memory subsystem of our systems configurable at design time.<br />
To build a such generator, the student will use python (e.g., Mako templates) and SystemVerilog for generating the memory and the bus subsystem starting from user-specified configurations (e.g. in JSON format).<br />
The project focuses on extending the bus topology choice and the memory size and topology. <br />
The user should be able to specify the on-chip RAM size, the number of banks the RAM is made of, and the cut size.<br />
The user should also be able to specify the bus topology, the number of masters that can access the bus concurrently, the number of slaves that can be accessed concurrently, and whether master-to-slave specific paths are broken with FIFOs to improve timing and area.<br />
In addition, the user should choose how many ports each slave gives access to the masters (e.g., 1 port from the SPI, 4 ports from memory, etc.), and whether the access is given in continuous addressing mode, or interleaved addressing mode. <br />
For example, if a memory exposes 4 access ports to the masters, the ports are used in a contiguous way if the address from base to base+offset goes to port 0, from base+offset to base+2 * offset to port 1, etc. Whereas the ports are used interleaved if the base address goes to port 0, base+1 to port 1, ..., base+4 to port0 again, etc.<br />
The user will specify the configuration in a JSON file to tell the python script how to generate the HDL files.<br />
<br />
Throughout the project, the student will learn:<br />
<br />
* 1. How to design and implement different memory subsystems in terms of numbers of ports/banks (bandwidth), and numbers of cuts (size of memory the bank is made of, e.g., 1 bank of 32kB can be done with 1 cut of 32kB, 2 cuts of 16kB or 4 cuts of 8KB, etc.).<br />
<br />
* 2. How to design and implement different bus topologies in terms of numbers of masters that can access concurrently the bus, numbers of slaves directly connected to the bus (slave latency), and for multiple-ports memory slaves, addressing mode (contiguous or interleaved).<br />
<br />
* 3. How to integrate such new IPs and their generator in the x-heep and pulp open-source RISC-V microcontrollers.<br />
<br />
* 4. How to analyze how the selected topologies and memory architectures impact the performance of the microcontroller on a set of applications (e.g. matrix multiplication, convolution, ADC sampling).<br />
<br />
* 5. How to work with git repositories and in a team of people all contributing to the same project.<br />
<br />
The project will be carried out at the IIS laboratory at ETH, one of the world's top-class universities, under the supervision of Prof. Luca Benini and Dr. Davide Schiavone.<br />
<br />
==Project objectives==<br />
<br />
* Understanding the X-HEEP and PULP microcontrollers, how they works, and learning how IPs are integrated. Understand how the configuration script of the bus and CPU is implemented.<br />
* Understanding the memories building blocks modules of MCUs and how to connect them.<br />
* Understanding the bus building blocks modules of MCUs and how to connect them.<br />
* Building python scripts to generate the SystemVerilog files that implement the user’s memory and bus selected topology.<br />
* Validation of the built MCU generator with C tests.<br />
* Analysis of the performance of the different bus and memory architectures over a set of applications.<br />
<br />
Do not worry, the work it is not done twice, most of the IPs are actually the same.<br />
<br />
==Required Skills==<br />
<br />
To work on this project, you will need:<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 / VLSI2 courses is recommended<br />
* to have prior knowledge of hardware design and python<br />
* to have scientific curiosity<br />
* to have good communication skills and to know English <br />
<br />
Other skills that you might find useful include:<br />
* familiarity with a scripting language<br />
* to be strongly motivated for a super-cool project<br />
<br />
If you want to work on this project, but you think that you do not match some the required skills, we can give you some preliminary exercise to help you fill in the gap.<br />
<br />
==Contact Information==<br />
<br />
Contact Davide Schiavone davide.schiavone@epfl.ch or pschiavo@iis.ee.ethz.ch<br />
<br />
===Professor===<br />
: [https://ee.ethz.ch/the-department/people-a-z/person-detail.luca-benini.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
===Meetings & Presentations===<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues. <br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [http://eda.ee.ethz.ch/index.php/Design_review].</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=AXI-based_Network_on_Chip_(NoC)_system&diff=9682AXI-based Network on Chip (NoC) system2023-10-23T12:43:05Z<p>Fischeti: </p>
<hr />
<div><!-- AXI-based Network on Chip (NoC) system --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Interconnect]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Expired]]<br />
[[Category:Fischeti]]<br />
<br />
= Overview =<br />
<br />
== Status: Expired ==<br />
<br />
* Type: 2 Semester Thesis or 1 Master Thesis<br />
* Professor: Prof. Dr. Luca Benini<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User:Michaero | Michael Rogenmoser]]: [mailto:michaero@iis.ee.ethz.ch michaero@iis.ee.ethz.ch]<br />
<br />
<br />
== Introduction ==<br />
<br />
As the number of computing cores and accelerators on a single chip is rapidly growing, there is a rising need for scalable, high-bandwidth, and low-latency on-chip communication fabrics. This need is often addressed by deploying networks-on-chip (NoCs) through which the compute cores can communicate, similar to how computers can communicate through the Internet. The nodes that are connected to the NoC usually communicate with (e.g., AMBA AXI, TCDM) that cannot be used on the network layer, hence requiring protocol translation at the border.<br />
<br />
== Project ==<br />
In our group, we are currently developing an NoC that can interface with all the AXI IPs we have developed so far in our group. The goal of this project would be to build a system with a mesh NoC and a couple of cores and do the system integration for a potential tapeout. For the verification, low-level software and drivers should be written and tested<br />
<br />
== Character ==<br />
<br />
* 30% System Integration<br />
* 20% Verification<br />
* 20% Low-level software and drivers<br />
* 30% Backend implementation<br />
<br />
== Prerequisites ==<br />
* Experience with System Verilog, VLSI 1<br />
* Experience with physical implementation, VLSI 2<br />
* C programming language experience</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=MCU_Bus_and_Memory_Generator:_implementation_of_a_Highly_Configurable_bus_and_memory_subsystem_for_a_RISC-V_based_microcontroller.&diff=9078MCU Bus and Memory Generator: implementation of a Highly Configurable bus and memory subsystem for a RISC-V based microcontroller.2023-03-08T12:36:43Z<p>Fischeti: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:ASIC]]<br />
[[Category:Computer and System Architecture]]<br />
[[Category:Hot]]<br />
[[Category:High Performance Computing]]<br />
[[Category:FPGA & Digital ASIC Design]]<br />
[[Category:Energy Efficient SoCs]]<br />
[[Category:2022]]<br />
[[Category:Master Thesis]]<br />
[[Category:Pschiavo]]<br />
[[Category:Meggiman]]<br />
[[Category:Fischeti]]<br />
[[Category:Michaero]]<br />
[[Category:In Progress]]<br />
<br />
<br />
<br />
== Status: In Progress ==<br />
<br />
* Type: Semester or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Pschiavo | Pasquale Davide Schiavone]]<br />
** [[:User:Meggiman | Manuel Eggimann]] : [mailto:meggimann@iis.ee.ethz.ch meggimann@iis.ee.ethz.ch]<br />
** [[:User:Fischeti | Tim Fischer]] : [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User:Michaero | Michael Rogenmoser]] : [mailto:michaero@iis.ee.ethz.ch michaero@iis.ee.ethz.ch]<br />
<br />
<br />
==Introduction==<br />
<br />
<br />
Microcontrollers (MCUs) are used in a wide range of applications ranging from sensor-monitoring all the way to robotics. Despite typically lower in performance, they are usually preferred over custom circuits and FPGAs thanks to their versatility and easy programmability via software routines typically written in the C language. <br />
STMicroelectronics, SiLabs, NXP, Raspberry, and Arduino are probably the most used ones. These vendors provide different flavors of microcontrollers that optimize different angles, such as power, cost, performance, functionalities, and software stack, which is typically either based on real-time operating systems. To be able to customize the computer architecture of those microcontrollers, if possible at all, the users must buy expensive licenses from the providers for the rights of customizing and selling their own IPs. <br />
Differently, the Linux revolution made the software business wider and more competitive as much of the software is open source and free to be customized.<br />
Linux, Android, freeRTOS, GCC, and LLVM are just a few examples of the most used software in the microcontroller markets<br />
However, in recent years, open-source hardware is becoming a key trend that is changing how the chip industry works. <br />
It all started with the open source instruction set architecture RISC-V, which was followed by many IPs described in Verilog or VHDL on Github or other servers, all the way to the foundation of worldwide organizations that maintain, develop, and verify industrial-grade open source processors and microcontrollers as for example the OpenHW Group and LowRISC, whose IPs are used in industrial-products freely, without the need of paying any fee or license. <br />
Switzerland, thanks to ETH Zurich, plays a big role in this picture as one of the main providers of those IPs coming from the PULP project that eventually end up first being verified by the aforementioned foundations, then being implemented in products. <br />
Also, the Embedded Systems Laboratory (ESL) at the Ecole Polytechnique Fédérale de Lausanne (EPFL) is developing an open-source RISC-V-based microcontroller called x-heep, which shares most of the IPs with the ETH's PULP system.<br />
<br />
==Project description==<br />
<br />
Our idea is to provide high-level customization to our RISC-V MCUs (pulp and x-heep) by providing tools to configure the MCU to the users' needs, in a friendly way, making it de-facto a microcontroller generator. In particular, we want to make the bus and the memory subsystem of our systems configurable at design time.<br />
To build a such generator, the student will use python (e.g., Mako templates) and SystemVerilog for generating the memory and the bus subsystem starting from user-specified configurations (e.g. in JSON format).<br />
The project focuses on extending the bus topology choice and the memory size and topology. <br />
The user should be able to specify the on-chip RAM size, the number of banks the RAM is made of, and the cut size.<br />
The user should also be able to specify the bus topology, the number of masters that can access the bus concurrently, the number of slaves that can be accessed concurrently, and whether master-to-slave specific paths are broken with FIFOs to improve timing and area.<br />
In addition, the user should choose how many ports each slave gives access to the masters (e.g., 1 port from the SPI, 4 ports from memory, etc.), and whether the access is given in continuous addressing mode, or interleaved addressing mode. <br />
For example, if a memory exposes 4 access ports to the masters, the ports are used in a contiguous way if the address from base to base+offset goes to port 0, from base+offset to base+2 * offset to port 1, etc. Whereas the ports are used interleaved if the base address goes to port 0, base+1 to port 1, ..., base+4 to port0 again, etc.<br />
The user will specify the configuration in a JSON file to tell the python script how to generate the HDL files.<br />
<br />
Throughout the project, the student will learn:<br />
<br />
* 1. How to design and implement different memory subsystems in terms of numbers of ports/banks (bandwidth), and numbers of cuts (size of memory the bank is made of, e.g., 1 bank of 32kB can be done with 1 cut of 32kB, 2 cuts of 16kB or 4 cuts of 8KB, etc.).<br />
<br />
* 2. How to design and implement different bus topologies in terms of numbers of masters that can access concurrently the bus, numbers of slaves directly connected to the bus (slave latency), and for multiple-ports memory slaves, addressing mode (contiguous or interleaved).<br />
<br />
* 3. How to integrate such new IPs and their generator in the x-heep and pulp open-source RISC-V microcontrollers.<br />
<br />
* 4. How to analyze how the selected topologies and memory architectures impact the performance of the microcontroller on a set of applications (e.g. matrix multiplication, convolution, ADC sampling).<br />
<br />
* 5. How to work with git repositories and in a team of people all contributing to the same project.<br />
<br />
The project will be carried out at the IIS laboratory at ETH, one of the world's top-class universities, under the supervision of Prof. Luca Benini and Dr. Davide Schiavone.<br />
<br />
==Project objectives==<br />
<br />
* Understanding the X-HEEP and PULP microcontrollers, how they works, and learning how IPs are integrated. Understand how the configuration script of the bus and CPU is implemented.<br />
* Understanding the memories building blocks modules of MCUs and how to connect them.<br />
* Understanding the bus building blocks modules of MCUs and how to connect them.<br />
* Building python scripts to generate the SystemVerilog files that implement the user’s memory and bus selected topology.<br />
* Validation of the built MCU generator with C tests.<br />
* Analysis of the performance of the different bus and memory architectures over a set of applications.<br />
<br />
Do not worry, the work it is not done twice, most of the IPs are actually the same.<br />
<br />
==Required Skills==<br />
<br />
To work on this project, you will need:<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 / VLSI2 courses is recommended<br />
* to have prior knowledge of hardware design and python<br />
* to have scientific curiosity<br />
* to have good communication skills and to know English <br />
<br />
Other skills that you might find useful include:<br />
* familiarity with a scripting language<br />
* to be strongly motivated for a super-cool project<br />
<br />
If you want to work on this project, but you think that you do not match some the required skills, we can give you some preliminary exercise to help you fill in the gap.<br />
<br />
==Contact Information==<br />
<br />
Contact Davide Schiavone davide.schiavone@epfl.ch or pschiavo@iis.ee.ethz.ch<br />
<br />
===Professor===<br />
: [https://ee.ethz.ch/the-department/people-a-z/person-detail.luca-benini.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
===Meetings & Presentations===<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues. <br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [http://eda.ee.ethz.ch/index.php/Design_review].</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=MCU_Bus_and_Memory_Generator:_implementation_of_a_Highly_Configurable_bus_and_memory_subsystem_for_a_RISC-V_based_microcontroller.&diff=9077MCU Bus and Memory Generator: implementation of a Highly Configurable bus and memory subsystem for a RISC-V based microcontroller.2023-03-08T12:35:27Z<p>Fischeti: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:ASIC]]<br />
[[Category:Computer and System Architecture]]<br />
[[Category:Hot]]<br />
[[Category:High Performance Computing]]<br />
[[Category:FPGA & Digital ASIC Design]]<br />
[[Category:Energy Efficient SoCs]]<br />
[[Category:2022]]<br />
[[Category:Master Thesis]]<br />
[[Category:Pschiavo]]<br />
[[Category:Meggiman]]<br />
[[Category:In Progress]]<br />
<br />
<br />
<br />
== Status: In Progress ==<br />
<br />
* Type: Semester or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Pschiavo | Pasquale Davide Schiavone]]<br />
** [[:User:Meggiman | Manuel Eggimann]] : [mailto:meggimann@iis.ee.ethz.ch meggimann@iis.ee.ethz.ch]<br />
** [[:User:Fischeti | Tim Fischer]] : [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User:Michaero | Michael Rogenmoser]] : [mailto:michaero@iis.ee.ethz.ch michaero@iis.ee.ethz.ch]<br />
<br />
<br />
==Introduction==<br />
<br />
<br />
Microcontrollers (MCUs) are used in a wide range of applications ranging from sensor-monitoring all the way to robotics. Despite typically lower in performance, they are usually preferred over custom circuits and FPGAs thanks to their versatility and easy programmability via software routines typically written in the C language. <br />
STMicroelectronics, SiLabs, NXP, Raspberry, and Arduino are probably the most used ones. These vendors provide different flavors of microcontrollers that optimize different angles, such as power, cost, performance, functionalities, and software stack, which is typically either based on real-time operating systems. To be able to customize the computer architecture of those microcontrollers, if possible at all, the users must buy expensive licenses from the providers for the rights of customizing and selling their own IPs. <br />
Differently, the Linux revolution made the software business wider and more competitive as much of the software is open source and free to be customized.<br />
Linux, Android, freeRTOS, GCC, and LLVM are just a few examples of the most used software in the microcontroller markets<br />
However, in recent years, open-source hardware is becoming a key trend that is changing how the chip industry works. <br />
It all started with the open source instruction set architecture RISC-V, which was followed by many IPs described in Verilog or VHDL on Github or other servers, all the way to the foundation of worldwide organizations that maintain, develop, and verify industrial-grade open source processors and microcontrollers as for example the OpenHW Group and LowRISC, whose IPs are used in industrial-products freely, without the need of paying any fee or license. <br />
Switzerland, thanks to ETH Zurich, plays a big role in this picture as one of the main providers of those IPs coming from the PULP project that eventually end up first being verified by the aforementioned foundations, then being implemented in products. <br />
Also, the Embedded Systems Laboratory (ESL) at the Ecole Polytechnique Fédérale de Lausanne (EPFL) is developing an open-source RISC-V-based microcontroller called x-heep, which shares most of the IPs with the ETH's PULP system.<br />
<br />
==Project description==<br />
<br />
Our idea is to provide high-level customization to our RISC-V MCUs (pulp and x-heep) by providing tools to configure the MCU to the users' needs, in a friendly way, making it de-facto a microcontroller generator. In particular, we want to make the bus and the memory subsystem of our systems configurable at design time.<br />
To build a such generator, the student will use python (e.g., Mako templates) and SystemVerilog for generating the memory and the bus subsystem starting from user-specified configurations (e.g. in JSON format).<br />
The project focuses on extending the bus topology choice and the memory size and topology. <br />
The user should be able to specify the on-chip RAM size, the number of banks the RAM is made of, and the cut size.<br />
The user should also be able to specify the bus topology, the number of masters that can access the bus concurrently, the number of slaves that can be accessed concurrently, and whether master-to-slave specific paths are broken with FIFOs to improve timing and area.<br />
In addition, the user should choose how many ports each slave gives access to the masters (e.g., 1 port from the SPI, 4 ports from memory, etc.), and whether the access is given in continuous addressing mode, or interleaved addressing mode. <br />
For example, if a memory exposes 4 access ports to the masters, the ports are used in a contiguous way if the address from base to base+offset goes to port 0, from base+offset to base+2 * offset to port 1, etc. Whereas the ports are used interleaved if the base address goes to port 0, base+1 to port 1, ..., base+4 to port0 again, etc.<br />
The user will specify the configuration in a JSON file to tell the python script how to generate the HDL files.<br />
<br />
Throughout the project, the student will learn:<br />
<br />
* 1. How to design and implement different memory subsystems in terms of numbers of ports/banks (bandwidth), and numbers of cuts (size of memory the bank is made of, e.g., 1 bank of 32kB can be done with 1 cut of 32kB, 2 cuts of 16kB or 4 cuts of 8KB, etc.).<br />
<br />
* 2. How to design and implement different bus topologies in terms of numbers of masters that can access concurrently the bus, numbers of slaves directly connected to the bus (slave latency), and for multiple-ports memory slaves, addressing mode (contiguous or interleaved).<br />
<br />
* 3. How to integrate such new IPs and their generator in the x-heep and pulp open-source RISC-V microcontrollers.<br />
<br />
* 4. How to analyze how the selected topologies and memory architectures impact the performance of the microcontroller on a set of applications (e.g. matrix multiplication, convolution, ADC sampling).<br />
<br />
* 5. How to work with git repositories and in a team of people all contributing to the same project.<br />
<br />
The project will be carried out at the IIS laboratory at ETH, one of the world's top-class universities, under the supervision of Prof. Luca Benini and Dr. Davide Schiavone.<br />
<br />
==Project objectives==<br />
<br />
* Understanding the X-HEEP and PULP microcontrollers, how they works, and learning how IPs are integrated. Understand how the configuration script of the bus and CPU is implemented.<br />
* Understanding the memories building blocks modules of MCUs and how to connect them.<br />
* Understanding the bus building blocks modules of MCUs and how to connect them.<br />
* Building python scripts to generate the SystemVerilog files that implement the user’s memory and bus selected topology.<br />
* Validation of the built MCU generator with C tests.<br />
* Analysis of the performance of the different bus and memory architectures over a set of applications.<br />
<br />
Do not worry, the work it is not done twice, most of the IPs are actually the same.<br />
<br />
==Required Skills==<br />
<br />
To work on this project, you will need:<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 / VLSI2 courses is recommended<br />
* to have prior knowledge of hardware design and python<br />
* to have scientific curiosity<br />
* to have good communication skills and to know English <br />
<br />
Other skills that you might find useful include:<br />
* familiarity with a scripting language<br />
* to be strongly motivated for a super-cool project<br />
<br />
If you want to work on this project, but you think that you do not match some the required skills, we can give you some preliminary exercise to help you fill in the gap.<br />
<br />
==Contact Information==<br />
<br />
Contact Davide Schiavone davide.schiavone@epfl.ch or pschiavo@iis.ee.ethz.ch<br />
<br />
===Professor===<br />
: [https://ee.ethz.ch/the-department/people-a-z/person-detail.luca-benini.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
===Meetings & Presentations===<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues. <br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [http://eda.ee.ethz.ch/index.php/Design_review].</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Design_of_a_Prototype_Chip_with_Interleaved_Memory_and_Network-on-Chip&diff=9076Design of a Prototype Chip with Interleaved Memory and Network-on-Chip2023-03-08T12:30:22Z<p>Fischeti: </p>
<hr />
<div><!-- Design of a Prototype Chip with Interleaved Memory and Network-on-Chip (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Completed]]<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. Thorsten Hoefler<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** Patrick Iff: [mailto:patrick.iff@inf.ethz.ch patrick.iff@inf.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
As the number of compute cores and accelerators on a single chip is rapidly growing, there is a rising need for scalable, high-bandwidth and low-latency on-chip commu- nication fabrics. This need is often addressed by deploying networks-on-chip (NoCs) through which the compute cores can communicate similar to how computers can communicate through the Internet.<br />
These NoCs can occupy a significant percentage of the total chip area and as the number of cores on a single chip increases, this percentage keeps increasing. In contrast to compute cores that reach a high logic cell utilization, the part of the chip where the NoC sits usually attains a rather low cell utilization since NoCs are dominated by routing.<br />
We could take advantage of the otherwise unused regions of the chip where the NoC sits by instantiating a latch-based standard-cell memory (SCM) as a scratchpad memory (SPM) directly addressable by the NoC.<br />
<br />
= Project =<br />
<br />
The goal of this thesis is to develop a prototype that interleaves SCM (which is dominated by logic cells) and the NoC (which is dominated by routing) in order to maximize the utilization of both logic cells and routing. The thesis should explore the benefit of memory-in-NoC by showing how much additional memory we can add without significantly increasing the total chip area.<br />
<br />
== Character ==<br />
<br />
* 30% Architecture Specification<br />
* 40% Implementation of the architecture<br />
* 30% SCM in NoC exploration<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with the System Verilog language, VLSI 1<br />
* Experience with physical implementation, VLSI 2<br />
<br />
= References =</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Design_of_a_Prototype_Chip_with_Interleaved_Memory_and_Network-on-Chip&diff=9075Design of a Prototype Chip with Interleaved Memory and Network-on-Chip2023-03-08T12:30:04Z<p>Fischeti: </p>
<hr />
<div><!-- Design of a Prototype Chip with Interleaved Memory and Network-on-Chip (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Completed]]<br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. Thorsten Hoefler<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** Patrick Iff: [mailto:patrick.iff@inf.ethz.ch patrick.iff@inf.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
As the number of compute cores and accelerators on a single chip is rapidly growing, there is a rising need for scalable, high-bandwidth and low-latency on-chip commu- nication fabrics. This need is often addressed by deploying networks-on-chip (NoCs) through which the compute cores can communicate similar to how computers can communicate through the Internet.<br />
These NoCs can occupy a significant percentage of the total chip area and as the number of cores on a single chip increases, this percentage keeps increasing. In contrast to compute cores that reach a high logic cell utilization, the part of the chip where the NoC sits usually attains a rather low cell utilization since NoCs are dominated by routing.<br />
We could take advantage of the otherwise unused regions of the chip where the NoC sits by instantiating a latch-based standard-cell memory (SCM) as a scratchpad memory (SPM) directly addressable by the NoC.<br />
<br />
= Project =<br />
<br />
The goal of this thesis is to develop a prototype that interleaves SCM (which is dominated by logic cells) and the NoC (which is dominated by routing) in order to maximize the utilization of both logic cells and routing. The thesis should explore the benefit of memory-in-NoC by showing how much additional memory we can add without significantly increasing the total chip area.<br />
<br />
== Character ==<br />
<br />
* 30% Architecture Specification<br />
* 40% Implementation of the architecture<br />
* 30% SCM in NoC exploration<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with the System Verilog language, VLSI 1<br />
* Experience with physical implementation, VLSI 2<br />
<br />
= References =</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Energy_Efficient_AXI_Interface_to_Serial_Link_Physical_Layer&diff=9054Energy Efficient AXI Interface to Serial Link Physical Layer2023-02-20T16:21:39Z<p>Fischeti: Fischeti moved page Energy Efficient AXI Inteface to Serial Link Physical Layer to Energy Efficient AXI Interface to Serial Link Physical Layer: Typo</p>
<hr />
<div>[[Category:AnalogInterface]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Digital]]<br />
[[Category:sarjmandpour]]<br />
[[Category:Fischeti]]<br />
<br />
= Overview =<br />
<br />
===Status: Available===<br />
:Looking for semester project<br />
:Supervisor: <br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
**[[:User:sarjmandpour | Sina Arjmandpour]]: [mailto:sarjmandpour@iis.ee.ethz.ch sarjmandpour@iis.ee.ethz.ch]<br />
<br />
== Project ==<br />
Designing and implementing an energy-efficient AXI (Advanced eXtensible Interface) interface with a serial link physical layer involves developing a high-speed communication system that provides a standardized and efficient interface between different IP blocks with minimal power consumption. The project requires expertise in digital circuit design, communication protocols, as well as power management techniques. The project goal is to integrate the AXI interface with a serial link physical layer to enable high-speed communication between IP blocks with minimal power consumption. This involves selecting an appropriate protocol anf developing a suitable circuit architecture that incorporates the AXI interface and serial link physical layer, and implementing power-saving techniques such as voltage scaling, clock gating, and data compression. The project will require simulation and testing of the system to verify its performance, power consumption, and compatibility with different IP blocks. The final deliverable is an energy-efficient AXI interface with serial link physical layer that meets the specified data rate, power consumption, and compatibility requirements.<br />
<br />
<br />
===Prerequisites===<br />
* Experience with System Verilog or Verilog, VLSI 1<br />
* Experience with physical implementation, VLSI 2<br />
<br />
===Character===<br />
* 20% System Integration<br />
* 20% Verification<br />
* 30% Low-level software and drivers<br />
* 30% Backend implementation<br />
<br />
===Professor===<br />
*Prof. Dr. Luca Benini<br />
*Prof. Dr. Taekwang Jang<br />
<br />
=== Reference===</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Energy_Efficient_AXI_Inteface_to_Serial_Link_Physical_Layer&diff=9055Energy Efficient AXI Inteface to Serial Link Physical Layer2023-02-20T16:21:39Z<p>Fischeti: Fischeti moved page Energy Efficient AXI Inteface to Serial Link Physical Layer to Energy Efficient AXI Interface to Serial Link Physical Layer: Typo</p>
<hr />
<div>#REDIRECT [[Energy Efficient AXI Interface to Serial Link Physical Layer]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Energy_Efficient_AXI_Interface_to_Serial_Link_Physical_Layer&diff=9052Energy Efficient AXI Interface to Serial Link Physical Layer2023-02-20T16:20:18Z<p>Fischeti: </p>
<hr />
<div>[[Category:AnalogInterface]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:Digital]]<br />
[[Category:sarjmandpour]]<br />
[[Category:Fischeti]]<br />
<br />
= Overview =<br />
<br />
===Status: Available===<br />
:Looking for semester project<br />
:Supervisor: <br />
**[[:User:sarjmandpour | Sina Arjmandpour]]: [mailto:sarjmandpour@iis.ee.ethz.ch sarjmandpour@iis.ee.ethz.ch]<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
<br />
== Project ==<br />
Designing and implementing an energy-efficient AXI (Advanced eXtensible Interface) interface with a serial link physical layer involves developing a high-speed communication system that provides a standardized and efficient interface between different IP blocks with minimal power consumption. The project requires expertise in digital circuit design, communication protocols, as well as power management techniques. The project goal is to integrate the AXI interface with a serial link physical layer to enable high-speed communication between IP blocks with minimal power consumption. This involves selecting an appropriate protocol anf developing a suitable circuit architecture that incorporates the AXI interface and serial link physical layer, and implementing power-saving techniques such as voltage scaling, clock gating, and data compression. The project will require simulation and testing of the system to verify its performance, power consumption, and compatibility with different IP blocks. The final deliverable is an energy-efficient AXI interface with serial link physical layer that meets the specified data rate, power consumption, and compatibility requirements.<br />
<br />
<br />
===Prerequisites===<br />
* Experience with System Verilog or Verilog, VLSI 1<br />
* Experience with physical implementation, VLSI 2<br />
<br />
===Character===<br />
* 20% System Integration<br />
* 20% Verification<br />
* 30% Low-level software and drivers<br />
* 30% Backend implementation<br />
<br />
===Professor===<br />
*Prof. Dr. Luca Benini<br />
*Prof. Dr. Taekwang Jang<br />
<br />
=== Reference===</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Energy_Efficient_AXI_Interface_to_Serial_Link_Physical_Layer&diff=9051Energy Efficient AXI Interface to Serial Link Physical Layer2023-02-20T16:18:29Z<p>Fischeti: </p>
<hr />
<div>[[Category:AnalogInterface]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:sarjmandpour]]<br />
[[Category:Fischeti]]<br />
<br />
= Overview =<br />
<br />
===Status: Available===<br />
:Looking for semester project<br />
:Supervisor: <br />
**[[:User:sarjmandpour | Sina Arjmandpour]]: [mailto:sarjmandpour@iis.ee.ethz.ch sarjmandpour@iis.ee.ethz.ch]<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
<br />
== Project ==<br />
Designing and implementing an energy-efficient AXI (Advanced eXtensible Interface) interface with a serial link physical layer involves developing a high-speed communication system that provides a standardized and efficient interface between different IP blocks with minimal power consumption. The project requires expertise in digital circuit design, communication protocols, as well as power management techniques. The project goal is to integrate the AXI interface with a serial link physical layer to enable high-speed communication between IP blocks with minimal power consumption. This involves selecting an appropriate protocol anf developing a suitable circuit architecture that incorporates the AXI interface and serial link physical layer, and implementing power-saving techniques such as voltage scaling, clock gating, and data compression. The project will require simulation and testing of the system to verify its performance, power consumption, and compatibility with different IP blocks. The final deliverable is an energy-efficient AXI interface with serial link physical layer that meets the specified data rate, power consumption, and compatibility requirements.<br />
<br />
<br />
===Prerequisites===<br />
* Experience with System Verilog or Verilog, VLSI 1<br />
* Experience with physical implementation, VLSI 2<br />
<br />
===Character===<br />
* 20% System Integration<br />
* 20% Verification<br />
* 30% Low-level software and drivers<br />
* 30% Backend implementation<br />
<br />
===Professor===<br />
*Prof. Dr. Luca Benini<br />
*Prof. Dr. Taekwang Jang<br />
<br />
=== Reference===</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=AXI-based_Network_on_Chip_(NoC)_system&diff=8249AXI-based Network on Chip (NoC) system2022-11-02T09:31:43Z<p>Fischeti: Created page with "<!-- AXI-based Network on Chip (NoC) system --> Category:Digital Category:High Performance SoCs Category:Computer Architecture Category:Interconnect Categor..."</p>
<hr />
<div><!-- AXI-based Network on Chip (NoC) system --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Interconnect]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Fischeti]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: 2 Semester Thesis or 1 Master Thesis<br />
* Professor: Prof. Dr. Luca Benini<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User:Michaero | Michael Rogenmoser]]: [mailto:michaero@iis.ee.ethz.ch michaero@iis.ee.ethz.ch]<br />
<br />
<br />
== Introduction ==<br />
<br />
As the number of computing cores and accelerators on a single chip is rapidly growing, there is a rising need for scalable, high-bandwidth, and low-latency on-chip communication fabrics. This need is often addressed by deploying networks-on-chip (NoCs) through which the compute cores can communicate, similar to how computers can communicate through the Internet. The nodes that are connected to the NoC usually communicate with (e.g., AMBA AXI, TCDM) that cannot be used on the network layer, hence requiring protocol translation at the border.<br />
<br />
== Project ==<br />
In our group, we are currently developing an NoC that can interface with all the AXI IPs we have developed so far in our group. The goal of this project would be to build a system with a mesh NoC and a couple of cores and do the system integration for a potential tapeout. For the verification, low-level software and drivers should be written and tested<br />
<br />
== Character ==<br />
<br />
* 30% System Integration<br />
* 20% Verification<br />
* 20% Low-level software and drivers<br />
* 30% Backend implementation<br />
<br />
== Prerequisites ==<br />
* Experience with System Verilog, VLSI 1<br />
* Experience with physical implementation, VLSI 2<br />
* C programming language experience</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Flexfloat_DL_Training_Framework&diff=8248Flexfloat DL Training Framework2022-11-02T08:13:21Z<p>Fischeti: </p>
<hr />
<div>[[File:Manticore concept.png|thumb]]<br />
<br />
==Project Overview==<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems, like the Manticore concept [2] including 4096 snitch cores. We plan to tape out a slightly smaller version of Manticore, called Occamy, a two-chiplet system in the near future. Snitch-based architectures are built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which is tightly coupled to accelerators such as an FPU or a DMA engine.<br />
<br />
Recently, industry and academia have started exploring the required computational precision for training. Many state-of-the-art training hardware platforms support by now not only 64-bit and 32-bit floating-point formats, but also 16-bit floating-point formats (binary16 by IEEE and brainfloat). Recent work proposes various training formats such as 8-bit floats [3,4,5].<br />
<br />
Most available DL frameworks allow to train networks with 64-bit, 32-bit or 16-bit FP formats. However, the FPU of our Occamy project supports two different types of 16-bit FP formats and two different types of 8-bit FP formats. Therefore, we would like to extend an available DL training framework (e.g., Pytorch) with a library (e.g., flexfloat [5]) capable of emulating various FP formats. Depending on your skills and the project type (SA or MA) this work can be extended by training various networks for various FP formats.<br />
<br />
<br />
===Literature===<br />
* [https://github.com/pulp-platform/snitch] Snitch Github<br />
* [https://ieeexplore.ieee.org/abstract/document/9296802] Manticore<br />
* [https://openreview.net/forum?id=HkxIKNSeIH] Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks<br />
* [https://proceedings.neurips.cc/paper/2018/file/335d3d1cd7ef05ec77714a215134914c-Paper.pdf] Training deep neural networks with 8-bit floating point numbers<br />
* [https://arxiv.org/abs/1905.12334] Mixed precision training with 8-bit floating point<br />
* [https://github.com/oprecomp/flexfloat] Flexfloat Github<br />
<br />
<br />
===Status: Completed ===<br />
* Looking for 1 Semester or 1 Master student<br />
* Contact: [[:User:Paulin | Gianna Paulin]], [[:User:Fischeti | Tim Fischer]]<br />
<br />
===Prerequisites===<br />
* Deep Learning<br />
* Python<br />
* C<br />
<br />
<!-- <br />
===Status: Completed ===<br />
: Fall Semester 2014 (sem13h2)<br />
: Matthias Baer, Renzo Andri<br />
---><br />
<!-- <br />
===Status: In Progress ===<br />
: Student A, StudentB<br />
: Supervision: [[:User:Mluisier | Mathieu Luisier]]<br />
---><br />
<br />
===Character===<br />
* 25% Theory<br />
* 75% Implementation<br />
<br />
===Professor===<br />
* [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=194234 Luca Benini]<br />
<br />
<br />
== Project Organization ==<br />
<br />
==== Weekly Meetings ====<br />
<br />
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.<br />
<br />
==== Report / Presentation ====<br />
<br />
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
====== Final Report ======<br />
<br />
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
====== Presentation ======<br />
<br />
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.<br />
<br />
<br />
[[Category:Deep Learning Acceleration]]<br />
[[Category:Deep Learning Projects]]<br />
[[Category:Completed]]<br />
[[Category:Software]]<br />
[[Category:Digital]]<br />
[[Category:Paulin]]<br />
[[Category:Fischeti]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
<br />
[[#top|↑ top]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Flexfloat_DL_Training_Framework&diff=8247Flexfloat DL Training Framework2022-11-02T08:04:29Z<p>Fischeti: </p>
<hr />
<div>[[File:Manticore concept.png|thumb]]<br />
<br />
[[Category: Deep Learning]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
[[Category: Fischeti]]<br />
[[Category: Completed]]<br />
==Project Overview==<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems, like the Manticore concept [2] including 4096 snitch cores. We plan to tape out a slightly smaller version of Manticore, called Occamy, a two-chiplet system in the near future. Snitch-based architectures are built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which is tightly coupled to accelerators such as an FPU or a DMA engine.<br />
<br />
Recently, industry and academia have started exploring the required computational precision for training. Many state-of-the-art training hardware platforms support by now not only 64-bit and 32-bit floating-point formats, but also 16-bit floating-point formats (binary16 by IEEE and brainfloat). Recent work proposes various training formats such as 8-bit floats [3,4,5].<br />
<br />
Most available DL frameworks allow to train networks with 64-bit, 32-bit or 16-bit FP formats. However, the FPU of our Occamy project supports two different types of 16-bit FP formats and two different types of 8-bit FP formats. Therefore, we would like to extend an available DL training framework (e.g., Pytorch) with a library (e.g., flexfloat [5]) capable of emulating various FP formats. Depending on your skills and the project type (SA or MA) this work can be extended by training various networks for various FP formats.<br />
<br />
<br />
===Literature===<br />
* [https://github.com/pulp-platform/snitch] Snitch Github<br />
* [https://ieeexplore.ieee.org/abstract/document/9296802] Manticore<br />
* [https://openreview.net/forum?id=HkxIKNSeIH] Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks<br />
* [https://proceedings.neurips.cc/paper/2018/file/335d3d1cd7ef05ec77714a215134914c-Paper.pdf] Training deep neural networks with 8-bit floating point numbers<br />
* [https://arxiv.org/abs/1905.12334] Mixed precision training with 8-bit floating point<br />
* [https://github.com/oprecomp/flexfloat] Flexfloat Github<br />
<br />
<br />
===Status: Completed ===<br />
* Looking for 1 Semester or 1 Master student<br />
* Contact: [[:User:Paulin | Gianna Paulin]], [[:User:Fischeti | Tim Fischer]]<br />
<br />
===Prerequisites===<br />
* Deep Learning<br />
* Python<br />
* C<br />
<br />
<!-- <br />
===Status: Completed ===<br />
: Fall Semester 2014 (sem13h2)<br />
: Matthias Baer, Renzo Andri<br />
---><br />
<!-- <br />
===Status: In Progress ===<br />
: Student A, StudentB<br />
: Supervision: [[:User:Mluisier | Mathieu Luisier]]<br />
---><br />
<br />
===Character===<br />
* 25% Theory<br />
* 75% Implementation<br />
<br />
===Professor===<br />
* [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=194234 Luca Benini]<br />
<br />
<br />
== Project Organization ==<br />
<br />
==== Weekly Meetings ====<br />
<br />
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.<br />
<br />
==== Report / Presentation ====<br />
<br />
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
====== Final Report ======<br />
<br />
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
====== Presentation ======<br />
<br />
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.<br />
<br />
<br />
[[Category:Deep Learning Acceleration]]<br />
[[Category:Deep Learning Projects]]<br />
[[Category:In progress]]<br />
[[Category:Software]]<br />
[[Category:Digital]]<br />
[[Category:Paulin]]<br />
[[Category:Fischeti]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
<br />
[[#top|↑ top]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Flexfloat_DL_Training_Framework&diff=8246Flexfloat DL Training Framework2022-11-02T08:03:15Z<p>Fischeti: </p>
<hr />
<div>[[File:Manticore concept.png|thumb]]<br />
<br />
[[Category: Deep Learning]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
[[Category: Fischeti]]<br />
[[Category: Completed]]<br />
==Project Overview==<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems, like the Manticore concept [2] including 4096 snitch cores. We plan to tape out a slightly smaller version of Manticore, called Occamy, a two-chiplet system in the near future. Snitch-based architectures are built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which is tightly coupled to accelerators such as an FPU or a DMA engine.<br />
<br />
Recently, industry and academia have started exploring the required computational precision for training. Many state-of-the-art training hardware platforms support by now not only 64-bit and 32-bit floating-point formats, but also 16-bit floating-point formats (binary16 by IEEE and brainfloat). Recent work proposes various training formats such as 8-bit floats [3,4,5].<br />
<br />
Most available DL frameworks allow to train networks with 64-bit, 32-bit or 16-bit FP formats. However, the FPU of our Occamy project supports two different types of 16-bit FP formats and two different types of 8-bit FP formats. Therefore, we would like to extend an available DL training framework (e.g., Pytorch) with a library (e.g., flexfloat [5]) capable of emulating various FP formats. Depending on your skills and the project type (SA or MA) this work can be extended by training various networks for various FP formats.<br />
<br />
<br />
===Literature===<br />
* [https://github.com/pulp-platform/snitch] Snitch Github<br />
* [https://ieeexplore.ieee.org/abstract/document/9296802] Manticore<br />
* [https://openreview.net/forum?id=HkxIKNSeIH] Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks<br />
* [https://proceedings.neurips.cc/paper/2018/file/335d3d1cd7ef05ec77714a215134914c-Paper.pdf] Training deep neural networks with 8-bit floating point numbers<br />
* [https://arxiv.org/abs/1905.12334] Mixed precision training with 8-bit floating point<br />
* [https://github.com/oprecomp/flexfloat] Flexfloat Github<br />
<br />
<br />
===Status: Completed ===<br />
* Looking for 1 Semester or 1 Master student<br />
* Contact: [[:User:Paulin | Gianna Paulin]], [[:User:fischeti|Tim Fischer]]<br />
<br />
===Prerequisites===<br />
* Deep Learning<br />
* Python<br />
* C<br />
<br />
<!-- <br />
===Status: Completed ===<br />
: Fall Semester 2014 (sem13h2)<br />
: Matthias Baer, Renzo Andri<br />
---><br />
<!-- <br />
===Status: In Progress ===<br />
: Student A, StudentB<br />
: Supervision: [[:User:Mluisier | Mathieu Luisier]]<br />
---><br />
<br />
===Character===<br />
* 25% Theory<br />
* 75% Implementation<br />
<br />
===Professor===<br />
* [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=194234 Luca Benini]<br />
<br />
<br />
== Project Organization ==<br />
<br />
==== Weekly Meetings ====<br />
<br />
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.<br />
<br />
==== Report / Presentation ====<br />
<br />
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
====== Final Report ======<br />
<br />
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
====== Presentation ======<br />
<br />
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.<br />
<br />
<br />
[[Category:Deep Learning Acceleration]]<br />
[[Category:Deep Learning Projects]]<br />
[[Category:In progress]]<br />
[[Category:Software]]<br />
[[Category:Digital]]<br />
[[Category:Paulin]]<br />
[[Category:Fischeti]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
<br />
[[#top|↑ top]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Flexfloat_DL_Training_Framework&diff=8245Flexfloat DL Training Framework2022-11-02T07:59:37Z<p>Fischeti: </p>
<hr />
<div>[[File:Manticore concept.png|thumb]]<br />
<br />
[[Category: Deep Learning]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
[[Category: Fischeti]]<br />
==Project Overview==<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems, like the Manticore concept [2] including 4096 snitch cores. We plan to tape out a slightly smaller version of Manticore, called Occamy, a two-chiplet system in the near future. Snitch-based architectures are built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which is tightly coupled to accelerators such as an FPU or a DMA engine.<br />
<br />
Recently, industry and academia have started exploring the required computational precision for training. Many state-of-the-art training hardware platforms support by now not only 64-bit and 32-bit floating-point formats, but also 16-bit floating-point formats (binary16 by IEEE and brainfloat). Recent work proposes various training formats such as 8-bit floats [3,4,5].<br />
<br />
Most available DL frameworks allow to train networks with 64-bit, 32-bit or 16-bit FP formats. However, the FPU of our Occamy project supports two different types of 16-bit FP formats and two different types of 8-bit FP formats. Therefore, we would like to extend an available DL training framework (e.g., Pytorch) with a library (e.g., flexfloat [5]) capable of emulating various FP formats. Depending on your skills and the project type (SA or MA) this work can be extended by training various networks for various FP formats.<br />
<br />
<br />
===Literature===<br />
* [https://github.com/pulp-platform/snitch] Snitch Github<br />
* [https://ieeexplore.ieee.org/abstract/document/9296802] Manticore<br />
* [https://openreview.net/forum?id=HkxIKNSeIH] Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks<br />
* [https://proceedings.neurips.cc/paper/2018/file/335d3d1cd7ef05ec77714a215134914c-Paper.pdf] Training deep neural networks with 8-bit floating point numbers<br />
* [https://arxiv.org/abs/1905.12334] Mixed precision training with 8-bit floating point<br />
* [https://github.com/oprecomp/flexfloat] Flexfloat Github<br />
<br />
<br />
===Status: Completed ===<br />
* Looking for 1 Semester or 1 Master student<br />
* Contact: [[:User:Paulin | Gianna Paulin]], [[:User:fischeti|Tim Fischer]]<br />
<br />
===Prerequisites===<br />
* Deep Learning<br />
* Python<br />
* C<br />
<br />
<!-- <br />
===Status: Completed ===<br />
: Fall Semester 2014 (sem13h2)<br />
: Matthias Baer, Renzo Andri<br />
---><br />
<!-- <br />
===Status: In Progress ===<br />
: Student A, StudentB<br />
: Supervision: [[:User:Mluisier | Mathieu Luisier]]<br />
---><br />
<br />
===Character===<br />
* 25% Theory<br />
* 75% Implementation<br />
<br />
===Professor===<br />
* [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=194234 Luca Benini]<br />
<br />
<br />
== Project Organization ==<br />
<br />
==== Weekly Meetings ====<br />
<br />
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.<br />
<br />
==== Report / Presentation ====<br />
<br />
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
====== Final Report ======<br />
<br />
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
====== Presentation ======<br />
<br />
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.<br />
<br />
<br />
[[Category:Deep Learning Acceleration]]<br />
[[Category:Deep Learning Projects]]<br />
[[Category:In progress]]<br />
[[Category:Software]]<br />
[[Category:Digital]]<br />
[[Category:Paulin]]<br />
[[Category:Fischeti]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
<br />
[[#top|↑ top]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Flexfloat_DL_Training_Framework&diff=8244Flexfloat DL Training Framework2022-11-02T07:57:23Z<p>Fischeti: </p>
<hr />
<div>[[File:Manticore concept.png|thumb]]<br />
<br />
[[Category:Deep Learning]]<br />
[[Category:Digital]]<br />
[[Category:Deep Learning Acceleration]]<br />
[[Category:Hardware Acceleration]]<br />
[[Category:Fischeti]]<br />
==Project Overview==<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems, like the Manticore concept [2] including 4096 snitch cores. We plan to tape out a slightly smaller version of Manticore, called Occamy, a two-chiplet system in the near future. Snitch-based architectures are built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which is tightly coupled to accelerators such as an FPU or a DMA engine.<br />
<br />
Recently, industry and academia have started exploring the required computational precision for training. Many state-of-the-art training hardware platforms support by now not only 64-bit and 32-bit floating-point formats, but also 16-bit floating-point formats (binary16 by IEEE and brainfloat). Recent work proposes various training formats such as 8-bit floats [3,4,5].<br />
<br />
Most available DL frameworks allow to train networks with 64-bit, 32-bit or 16-bit FP formats. However, the FPU of our Occamy project supports two different types of 16-bit FP formats and two different types of 8-bit FP formats. Therefore, we would like to extend an available DL training framework (e.g., Pytorch) with a library (e.g., flexfloat [5]) capable of emulating various FP formats. Depending on your skills and the project type (SA or MA) this work can be extended by training various networks for various FP formats.<br />
<br />
<br />
===Literature===<br />
* [https://github.com/pulp-platform/snitch] Snitch Github<br />
* [https://ieeexplore.ieee.org/abstract/document/9296802] Manticore<br />
* [https://openreview.net/forum?id=HkxIKNSeIH] Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks<br />
* [https://proceedings.neurips.cc/paper/2018/file/335d3d1cd7ef05ec77714a215134914c-Paper.pdf] Training deep neural networks with 8-bit floating point numbers<br />
* [https://arxiv.org/abs/1905.12334] Mixed precision training with 8-bit floating point<br />
* [https://github.com/oprecomp/flexfloat] Flexfloat Github<br />
<br />
<br />
===Status: Completed ===<br />
* Looking for 1 Semester or 1 Master student<br />
* Contact: [[:User:Paulin | Gianna Paulin]], [[:User:fischeti|Tim Fischer]]<br />
<br />
===Prerequisites===<br />
* Deep Learning<br />
* Python<br />
* C<br />
<br />
<!-- <br />
===Status: Completed ===<br />
: Fall Semester 2014 (sem13h2)<br />
: Matthias Baer, Renzo Andri<br />
---><br />
<!-- <br />
===Status: In Progress ===<br />
: Student A, StudentB<br />
: Supervision: [[:User:Mluisier | Mathieu Luisier]]<br />
---><br />
<br />
===Character===<br />
* 25% Theory<br />
* 75% Implementation<br />
<br />
===Professor===<br />
* [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=194234 Luca Benini]<br />
<br />
<br />
== Project Organization ==<br />
<br />
==== Weekly Meetings ====<br />
<br />
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.<br />
<br />
==== Report / Presentation ====<br />
<br />
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
====== Final Report ======<br />
<br />
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
====== Presentation ======<br />
<br />
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.<br />
<br />
<br />
[[Category:Deep Learning Acceleration]]<br />
[[Category:Deep Learning Projects]]<br />
[[Category:In progress]]<br />
[[Category:Software]]<br />
[[Category:Digital]]<br />
[[Category:Paulin]]<br />
[[Category:Fischeti]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
<br />
[[#top|↑ top]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Flexfloat_DL_Training_Framework&diff=8243Flexfloat DL Training Framework2022-11-02T07:55:42Z<p>Fischeti: </p>
<hr />
<div>[[File:Manticore concept.png|thumb]]<br />
==Project Overview==<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems, like the Manticore concept [2] including 4096 snitch cores. We plan to tape out a slightly smaller version of Manticore, called Occamy, a two-chiplet system in the near future. Snitch-based architectures are built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which is tightly coupled to accelerators such as an FPU or a DMA engine.<br />
<br />
Recently, industry and academia have started exploring the required computational precision for training. Many state-of-the-art training hardware platforms support by now not only 64-bit and 32-bit floating-point formats, but also 16-bit floating-point formats (binary16 by IEEE and brainfloat). Recent work proposes various training formats such as 8-bit floats [3,4,5].<br />
<br />
Most available DL frameworks allow to train networks with 64-bit, 32-bit or 16-bit FP formats. However, the FPU of our Occamy project supports two different types of 16-bit FP formats and two different types of 8-bit FP formats. Therefore, we would like to extend an available DL training framework (e.g., Pytorch) with a library (e.g., flexfloat [5]) capable of emulating various FP formats. Depending on your skills and the project type (SA or MA) this work can be extended by training various networks for various FP formats.<br />
<br />
<br />
===Literature===<br />
* [https://github.com/pulp-platform/snitch] Snitch Github<br />
* [https://ieeexplore.ieee.org/abstract/document/9296802] Manticore<br />
* [https://openreview.net/forum?id=HkxIKNSeIH] Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks<br />
* [https://proceedings.neurips.cc/paper/2018/file/335d3d1cd7ef05ec77714a215134914c-Paper.pdf] Training deep neural networks with 8-bit floating point numbers<br />
* [https://arxiv.org/abs/1905.12334] Mixed precision training with 8-bit floating point<br />
* [https://github.com/oprecomp/flexfloat] Flexfloat Github<br />
<br />
<br />
===Status: Completed ===<br />
* Looking for 1 Semester or 1 Master student<br />
* Contact: [[:User:Paulin | Gianna Paulin]], [[:User:fischeti|Tim Fischer]]<br />
<br />
===Prerequisites===<br />
* Deep Learning<br />
* Python<br />
* C<br />
<br />
<!-- <br />
===Status: Completed ===<br />
: Fall Semester 2014 (sem13h2)<br />
: Matthias Baer, Renzo Andri<br />
---><br />
<!-- <br />
===Status: In Progress ===<br />
: Student A, StudentB<br />
: Supervision: [[:User:Mluisier | Mathieu Luisier]]<br />
---><br />
<br />
===Character===<br />
* 25% Theory<br />
* 75% Implementation<br />
<br />
===Professor===<br />
* [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=194234 Luca Benini]<br />
<br />
<br />
== Project Organization ==<br />
<br />
==== Weekly Meetings ====<br />
<br />
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.<br />
<br />
==== Report / Presentation ====<br />
<br />
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
====== Final Report ======<br />
<br />
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
====== Presentation ======<br />
<br />
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.<br />
<br />
<br />
[[Category:Deep Learning Acceleration]]<br />
[[Category:Deep Learning Projects]]<br />
[[Category:In progress]]<br />
[[Category:Software]]<br />
[[Category:Digital]]<br />
[[Category:Paulin]]<br />
[[Category:Fischeti]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
<br />
[[#top|↑ top]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=User:Fischeti&diff=8242User:Fischeti2022-11-02T07:54:52Z<p>Fischeti: /* Projects In Progress */</p>
<hr />
<div>[[File:Tim_Fischer.jpeg|thumb|right|280px]]<br />
== Tim Fischer ==<br />
I received my Bachelor's degree in Information Technology and Electrical Engineering from Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland in 2018 and my Master's degree in April 2020. After that, I started as a PhD Student in the digital circuits and systems group of Prof. Dr. L. Benini<br />
<br />
==Interests==<br />
My research focus is on deploying Machine Learning Workloads on High-Performance Computing systems. Specifically I am interested in HW/SW Co-design of DNN Training algorithms as well as low-precision floating point DNN training. I have also previously worked on ML Hardware Accelerator for edge applications.<br />
<br />
<br />
<br />
==Contact Information==<br />
* '''Office''': ETZ J 76.2<br />
* '''e-mail''': [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 59 12<br />
* '''www''': [https://iis.ee.ethz.ch/people/person-detail.MjE3MzI0.TGlzdC8zOTg3LDk5MDE4ODk4MA==.html IIS Homepage]<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In Progress<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=User:Fischeti&diff=8241User:Fischeti2022-11-02T07:54:15Z<p>Fischeti: </p>
<hr />
<div>[[File:Tim_Fischer.jpeg|thumb|right|280px]]<br />
== Tim Fischer ==<br />
I received my Bachelor's degree in Information Technology and Electrical Engineering from Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland in 2018 and my Master's degree in April 2020. After that, I started as a PhD Student in the digital circuits and systems group of Prof. Dr. L. Benini<br />
<br />
==Interests==<br />
My research focus is on deploying Machine Learning Workloads on High-Performance Computing systems. Specifically I am interested in HW/SW Co-design of DNN Training algorithms as well as low-precision floating point DNN training. I have also previously worked on ML Hardware Accelerator for edge applications.<br />
<br />
<br />
<br />
==Contact Information==<br />
* '''Office''': ETZ J 76.2<br />
* '''e-mail''': [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 59 12<br />
* '''www''': [https://iis.ee.ethz.ch/people/person-detail.MjE3MzI0.TGlzdC8zOTg3LDk5MDE4ODk4MA==.html IIS Homepage]<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Design_of_a_Prototype_Chip_with_Interleaved_Memory_and_Network-on-Chip&diff=8240Design of a Prototype Chip with Interleaved Memory and Network-on-Chip2022-11-02T07:53:27Z<p>Fischeti: </p>
<hr />
<div><!-- Design of a Prototype Chip with Interleaved Memory and Network-on-Chip (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:In Progress]]<br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. Thorsten Hoefler<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** Patrick Iff: [mailto:patrick.iff@inf.ethz.ch patrick.iff@inf.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
As the number of compute cores and accelerators on a single chip is rapidly growing, there is a rising need for scalable, high-bandwidth and low-latency on-chip commu- nication fabrics. This need is often addressed by deploying networks-on-chip (NoCs) through which the compute cores can communicate similar to how computers can communicate through the Internet.<br />
These NoCs can occupy a significant percentage of the total chip area and as the number of cores on a single chip increases, this percentage keeps increasing. In contrast to compute cores that reach a high logic cell utilization, the part of the chip where the NoC sits usually attains a rather low cell utilization since NoCs are dominated by routing.<br />
We could take advantage of the otherwise unused regions of the chip where the NoC sits by instantiating a latch-based standard-cell memory (SCM) as a scratchpad memory (SPM) directly addressable by the NoC.<br />
<br />
= Project =<br />
<br />
The goal of this thesis is to develop a prototype that interleaves SCM (which is dominated by logic cells) and the NoC (which is dominated by routing) in order to maximize the utilization of both logic cells and routing. The thesis should explore the benefit of memory-in-NoC by showing how much additional memory we can add without significantly increasing the total chip area.<br />
<br />
== Character ==<br />
<br />
* 30% Architecture Specification<br />
* 40% Implementation of the architecture<br />
* 30% SCM in NoC exploration<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with the System Verilog language, VLSI 1<br />
* Experience with physical implementation, VLSI 2<br />
<br />
= References =</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Design_of_a_Prototype_Chip_with_Interleaved_Memory_and_Network-on-Chip&diff=8239Design of a Prototype Chip with Interleaved Memory and Network-on-Chip2022-11-02T07:51:57Z<p>Fischeti: </p>
<hr />
<div><!-- Design of a Prototype Chip with Interleaved Memory and Network-on-Chip (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. Thorsten Hoefler<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** Patrick Iff: [mailto:patrick.iff@inf.ethz.ch patrick.iff@inf.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
As the number of compute cores and accelerators on a single chip is rapidly growing, there is a rising need for scalable, high-bandwidth and low-latency on-chip commu- nication fabrics. This need is often addressed by deploying networks-on-chip (NoCs) through which the compute cores can communicate similar to how computers can communicate through the Internet.<br />
These NoCs can occupy a significant percentage of the total chip area and as the number of cores on a single chip increases, this percentage keeps increasing. In contrast to compute cores that reach a high logic cell utilization, the part of the chip where the NoC sits usually attains a rather low cell utilization since NoCs are dominated by routing.<br />
We could take advantage of the otherwise unused regions of the chip where the NoC sits by instantiating a latch-based standard-cell memory (SCM) as a scratchpad memory (SPM) directly addressable by the NoC.<br />
<br />
= Project =<br />
<br />
The goal of this thesis is to develop a prototype that interleaves SCM (which is dominated by logic cells) and the NoC (which is dominated by routing) in order to maximize the utilization of both logic cells and routing. The thesis should explore the benefit of memory-in-NoC by showing how much additional memory we can add without significantly increasing the total chip area.<br />
<br />
== Character ==<br />
<br />
* 30% Architecture Specification<br />
* 40% Implementation of the architecture<br />
* 30% SCM in NoC exploration<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with the System Verilog language, VLSI 1<br />
* Experience with physical implementation, VLSI 2<br />
<br />
= References =</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=A_Unified_Compute_Kernel_Library_for_Snitch_(1-2S)&diff=7987A Unified Compute Kernel Library for Snitch (1-2S)2022-08-17T07:49:30Z<p>Fischeti: </p>
<hr />
<div><!-- Universal Stream Semantic Registers for Snitch (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Fischeti]]<br />
[[Category:Aottaviano]]<br />
[[Category:Completed]]<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Currently, Snitch’s floating-point subsystem is of particular interest: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which together enable almost continuous FPU utilization in many data-oblivious problems.<br />
<br />
Over time, we have written many simple demonstrator programs for Snitch systems to measure their performance. Most of these involved ''computational kernels'', small computational functions like linear algebra operations, convolutions, or FFTs; these are frequently called in larger compute-intensive applications like machine learning layers or mathematical problem solvers. <br />
<br />
Since a lot compute time on is spent in these kernels, optimizing them for the target hardware is a highly effective way to accelerate computation. Thus, most existing Snitch kernels are hand-tunded, partially or completely written in assembly, and use Snitch's extensions for maximum performance and efficiency.<br />
<br />
Unfortunately, we do not have many compute kernels for Snitch yet, and much of the existing code was written for old versions of Snitch and is no longer maintained; it uses various code conventions, targets outdated versions of our extensions, and/or no longer performs optimally on our hardware. It is also scattered across different projects and repositories.<br />
<br />
= Project =<br />
<br />
In this project, you will create a unified library of high-performance computational kernels tailored to Snitch and its extensions for use in compute-intensive applications. To this end, you will:<br />
<br />
* '''Review and get familiar with existing efforts''' on <br />
** Snitch compute kernels and runtime<br />
** Compute libraries targeting PULP (PULP-NN [3], PULP DSP [4])<br />
* '''Define the structure and requirements''' for a compute kernel library<br />
* '''Write new compute kernels''', which may include any of:<br />
** Linear algebra (matrices/vectors/scalar sums, products, transpositions, inversions...)<br />
** Machine learning (pooling, batch normalization, backpropagation, ...)<br />
** Filter functions (convolution, FFT, ...)<br />
** Complex numbers (addition, multiplication, magnitude and argument, ...)<br />
* '''Verify your new kernels''' using results generated by common compute libraries<br />
* '''Evaluate the performance of your kernels''' in RTL simulations of a Snitch system<br />
<br />
Depending on your preferences and prior experience, you may choose which class(es) of kernels you want to tackle or focus on. The proposal can also be split into multiple individual projects if necessary.<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/document/9216552<br />
<br />
[2] https://ieeexplore.ieee.org/document/9068465<br />
<br />
[3] https://github.com/pulp-platform/pulp-nn<br />
<br />
[4] https://github.com/pulp-platform/pulp-dsp</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=A_Unified_Compute_Kernel_Library_for_Snitch_(1-2S)&diff=7986A Unified Compute Kernel Library for Snitch (1-2S)2022-08-17T07:47:21Z<p>Fischeti: </p>
<hr />
<div><!-- Universal Stream Semantic Registers for Snitch (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Fischeti]]<br />
[[Category:Aottaviano]]<br />
[[Category:In progress]]<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Currently, Snitch’s floating-point subsystem is of particular interest: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop, which together enable almost continuous FPU utilization in many data-oblivious problems.<br />
<br />
Over time, we have written many simple demonstrator programs for Snitch systems to measure their performance. Most of these involved ''computational kernels'', small computational functions like linear algebra operations, convolutions, or FFTs; these are frequently called in larger compute-intensive applications like machine learning layers or mathematical problem solvers. <br />
<br />
Since a lot compute time on is spent in these kernels, optimizing them for the target hardware is a highly effective way to accelerate computation. Thus, most existing Snitch kernels are hand-tunded, partially or completely written in assembly, and use Snitch's extensions for maximum performance and efficiency.<br />
<br />
Unfortunately, we do not have many compute kernels for Snitch yet, and much of the existing code was written for old versions of Snitch and is no longer maintained; it uses various code conventions, targets outdated versions of our extensions, and/or no longer performs optimally on our hardware. It is also scattered across different projects and repositories.<br />
<br />
= Project =<br />
<br />
In this project, you will create a unified library of high-performance computational kernels tailored to Snitch and its extensions for use in compute-intensive applications. To this end, you will:<br />
<br />
* '''Review and get familiar with existing efforts''' on <br />
** Snitch compute kernels and runtime<br />
** Compute libraries targeting PULP (PULP-NN [3], PULP DSP [4])<br />
* '''Define the structure and requirements''' for a compute kernel library<br />
* '''Write new compute kernels''', which may include any of:<br />
** Linear algebra (matrices/vectors/scalar sums, products, transpositions, inversions...)<br />
** Machine learning (pooling, batch normalization, backpropagation, ...)<br />
** Filter functions (convolution, FFT, ...)<br />
** Complex numbers (addition, multiplication, magnitude and argument, ...)<br />
* '''Verify your new kernels''' using results generated by common compute libraries<br />
* '''Evaluate the performance of your kernels''' in RTL simulations of a Snitch system<br />
<br />
Depending on your preferences and prior experience, you may choose which class(es) of kernels you want to tackle or focus on. The proposal can also be split into multiple individual projects if necessary.<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/document/9216552<br />
<br />
[2] https://ieeexplore.ieee.org/document/9068465<br />
<br />
[3] https://github.com/pulp-platform/pulp-nn<br />
<br />
[4] https://github.com/pulp-platform/pulp-dsp</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=User:Fischeti&diff=7985User:Fischeti2022-08-17T07:46:43Z<p>Fischeti: </p>
<hr />
<div>[[File:Tim_Fischer.jpeg|thumb|right|280px]]<br />
== Tim Fischer ==<br />
I received my Bachelor's degree in Information Technology and Electrical Engineering from Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland in 2018 and my Master's degree in April 2020. After that, I started as a PhD Student in the digital circuits and systems group of Prof. Dr. L. Benini<br />
<br />
==Interests==<br />
My research focus is on deploying Machine Learning Workloads on High-Performance Computing systems. Specifically I am interested in HW/SW Co-design of DNN Training algorithms as well as low-precision floating point DNN training. I have also previously worked onML Hardware Accelerator for edge applications.<br />
<br />
<br />
<br />
==Contact Information==<br />
* '''Office''': ETZ J 76.2<br />
* '''e-mail''': [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 59 12<br />
* '''www''': [https://iis.ee.ethz.ch/people/person-detail.MjE3MzI0.TGlzdC8zOTg3LDk5MDE4ODk4MA==.html IIS Homepage]<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=User:Fischeti&diff=7984User:Fischeti2022-08-17T07:45:19Z<p>Fischeti: /* Available Projects */</p>
<hr />
<div>[[File:Tim_Fischer.jpeg|thumb|right|280px]]<br />
== Tim Fischer ==<br />
I received my Bachelor's degree in Information Technology and Electrical Engineering from Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland in 2018 and my Master's degree in April 2020. After that, I started as a PhD Student in the digital circuits and systems group of Prof. Dr. L. Benini<br />
<br />
==Interests==<br />
My research focus is on deploying Machine Learning Workloads on High-Performance Computing systems. Specifically I am interested in HW/SW Co-design of DNN Training algorithms as well as low-precision floating point DNN training. I have also previously worked onML Hardware Accelerator for edge applications.<br />
<br />
<br />
<br />
==Contact Information==<br />
* '''Office''': ETZ J 76.2<br />
* '''e-mail''': [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 59 12<br />
* '''www''': [https://iis.ee.ethz.ch/people/person-detail.MjE3MzI0.TGlzdC8zOTg3LDk5MDE4ODk4MA==.html IIS Homepage]<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]<br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Flexfloat_DL_Training_Framework&diff=7983Flexfloat DL Training Framework2022-08-17T07:44:23Z<p>Fischeti: </p>
<hr />
<div>[[File:Manticore concept.png|thumb]]<br />
==Project Overview==<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems, like the Manticore concept [2] including 4096 snitch cores. We plan to tape out a slightly smaller version of Manticore, called Occamy, a two-chiplet system in the near future. Snitch-based architectures are built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which is tightly coupled to accelerators such as an FPU or a DMA engine.<br />
<br />
Recently, industry and academia have started exploring the required computational precision for training. Many state-of-the-art training hardware platforms support by now not only 64-bit and 32-bit floating-point formats, but also 16-bit floating-point formats (binary16 by IEEE and brainfloat). Recent work proposes various training formats such as 8-bit floats [3,4,5].<br />
<br />
Most available DL frameworks allow to train networks with 64-bit, 32-bit or 16-bit FP formats. However, the FPU of our Occamy project supports two different types of 16-bit FP formats and two different types of 8-bit FP formats. Therefore, we would like to extend an available DL training framework (e.g., Pytorch) with a library (e.g., flexfloat [5]) capable of emulating various FP formats. Depending on your skills and the project type (SA or MA) this work can be extended by training various networks for various FP formats.<br />
<br />
<br />
===Literature===<br />
* [https://github.com/pulp-platform/snitch] Snitch Github<br />
* [https://ieeexplore.ieee.org/abstract/document/9296802] Manticore<br />
* [https://openreview.net/forum?id=HkxIKNSeIH] Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks<br />
* [https://proceedings.neurips.cc/paper/2018/file/335d3d1cd7ef05ec77714a215134914c-Paper.pdf] Training deep neural networks with 8-bit floating point numbers<br />
* [https://arxiv.org/abs/1905.12334] Mixed precision training with 8-bit floating point<br />
* [https://github.com/oprecomp/flexfloat] Flexfloat Github<br />
<br />
<br />
===Status: In Progress ===<br />
* Looking for 1 Semester or 1 Master student<br />
* Contact: [[:User:Paulin | Gianna Paulin]], [[:User:fischeti|Tim Fischer]]<br />
<br />
===Prerequisites===<br />
* Deep Learning<br />
* Python<br />
* C<br />
<br />
<!-- <br />
===Status: Completed ===<br />
: Fall Semester 2014 (sem13h2)<br />
: Matthias Baer, Renzo Andri<br />
---><br />
<!-- <br />
===Status: In Progress ===<br />
: Student A, StudentB<br />
: Supervision: [[:User:Mluisier | Mathieu Luisier]]<br />
---><br />
<br />
===Character===<br />
* 25% Theory<br />
* 75% Implementation<br />
<br />
===Professor===<br />
* [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=194234 Luca Benini]<br />
<br />
<br />
== Project Organization ==<br />
<br />
==== Weekly Meetings ====<br />
<br />
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.<br />
<br />
==== Report / Presentation ====<br />
<br />
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
====== Final Report ======<br />
<br />
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
====== Presentation ======<br />
<br />
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.<br />
<br />
<br />
[[Category:Deep Learning Acceleration]]<br />
[[Category:Deep Learning Projects]]<br />
[[Category:In progress]]<br />
[[Category:Software]]<br />
[[Category:Digital]]<br />
[[Category:Paulin]]<br />
[[Category:Fischeti]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
<br />
[[#top|↑ top]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Flexfloat_DL_Training_Framework&diff=7982Flexfloat DL Training Framework2022-08-17T07:43:43Z<p>Fischeti: /* Status: Available */</p>
<hr />
<div>[[File:Manticore concept.png|thumb]]<br />
==Project Overview==<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems, like the Manticore concept [2] including 4096 snitch cores. We plan to tape out a slightly smaller version of Manticore, called Occamy, a two-chiplet system in the near future. Snitch-based architectures are built around the minimal RISC-V Snitch integer core, only about 15 thousand gates in size, which is tightly coupled to accelerators such as an FPU or a DMA engine.<br />
<br />
Recently, industry and academia have started exploring the required computational precision for training. Many state-of-the-art training hardware platforms support by now not only 64-bit and 32-bit floating-point formats, but also 16-bit floating-point formats (binary16 by IEEE and brainfloat). Recent work proposes various training formats such as 8-bit floats [3,4,5].<br />
<br />
Most available DL frameworks allow to train networks with 64-bit, 32-bit or 16-bit FP formats. However, the FPU of our Occamy project supports two different types of 16-bit FP formats and two different types of 8-bit FP formats. Therefore, we would like to extend an available DL training framework (e.g., Pytorch) with a library (e.g., flexfloat [5]) capable of emulating various FP formats. Depending on your skills and the project type (SA or MA) this work can be extended by training various networks for various FP formats.<br />
<br />
<br />
===Literature===<br />
* [https://github.com/pulp-platform/snitch] Snitch Github<br />
* [https://ieeexplore.ieee.org/abstract/document/9296802] Manticore<br />
* [https://openreview.net/forum?id=HkxIKNSeIH] Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks<br />
* [https://proceedings.neurips.cc/paper/2018/file/335d3d1cd7ef05ec77714a215134914c-Paper.pdf] Training deep neural networks with 8-bit floating point numbers<br />
* [https://arxiv.org/abs/1905.12334] Mixed precision training with 8-bit floating point<br />
* [https://github.com/oprecomp/flexfloat] Flexfloat Github<br />
<br />
<br />
===Status: In Progress ===<br />
* Looking for 1 Semester or 1 Master student<br />
* Contact: [[:User:Paulin | Gianna Paulin]], [[:User:fischeti|Tim Fischer]]<br />
<br />
===Prerequisites===<br />
* Deep Learning<br />
* Python<br />
* C<br />
<br />
<!-- <br />
===Status: Completed ===<br />
: Fall Semester 2014 (sem13h2)<br />
: Matthias Baer, Renzo Andri<br />
---><br />
<!-- <br />
===Status: In Progress ===<br />
: Student A, StudentB<br />
: Supervision: [[:User:Mluisier | Mathieu Luisier]]<br />
---><br />
<br />
===Character===<br />
* 25% Theory<br />
* 75% Implementation<br />
<br />
===Professor===<br />
* [http://www.iis.ee.ethz.ch/people/person-detail.html?persid=194234 Luca Benini]<br />
<br />
<br />
== Project Organization ==<br />
<br />
==== Weekly Meetings ====<br />
<br />
The student shall meet with the advisor(s) every week in order to discuss any issues/problems that may have persisted during the previous week and with a suggestion of next steps. These meetings are meant to provide a guaranteed time slot for mutual exchange of information on how to proceed, clear out any questions from either side and to ensure the student’s progress.<br />
<br />
==== Report / Presentation ====<br />
<br />
Documentation is an important and often overlooked aspect of engineering. One final report has to be completed within this project. Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Tgif, drawio or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
====== Final Report ======<br />
<br />
A digital copy of the report, the presentation, the developed software, build script/project files, drawings/illustrations, acquired data, etc. needs to be handed in at the end of the project. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
====== Presentation ======<br />
<br />
At the end of the project, the outcome of the thesis will be presented in a 15 (SA) or 20-minutes (MA) talk and 5 minutes of discussion in front of interested people of the Integrated Systems Laboratory. The presentation is open to the public, so you are welcome to invite interested friends. The exact date will be determined towards the end of the work.<br />
<br />
<br />
[[Category:Deep Learning Acceleration]]<br />
[[Category:Deep Learning Projects]]<br />
[[Category:Available]]<br />
[[Category:Software]]<br />
[[Category:Digital]]<br />
[[Category:Paulin]]<br />
[[Category:Fischeti]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
<br />
[[#top|↑ top]]</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=Design_of_a_Prototype_Chip_with_Interleaved_Memory_and_Network-on-Chip&diff=7981Design of a Prototype Chip with Interleaved Memory and Network-on-Chip2022-08-17T07:35:57Z<p>Fischeti: Created page with "<!-- Design of a Prototype Chip with Interleaved Memory and Network-on-Chip (1S) --> Category:Digital Category:High Performance SoCs [[Category:Computer Architecture]..."</p>
<hr />
<div><!-- Design of a Prototype Chip with Interleaved Memory and Network-on-Chip (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Fischeti]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. Thorsten Hoefler<br />
* Supervisors:<br />
** [[:User:Fischeti | Tim Fischer]]: [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
** Patrick Iff: [mailto:patrick.iff@inf.ethz.ch patrick.iff@inf.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
As the number of compute cores and accelerators on a single chip is rapidly growing, there is a rising need for scalable, high-bandwidth and low-latency on-chip commu- nication fabrics. This need is often addressed by deploying networks-on-chip (NoCs) through which the compute cores can communicate similar to how computers can communicate through the Internet.<br />
These NoCs can occupy a significant percentage of the total chip area and as the number of cores on a single chip increases, this percentage keeps increasing. In contrast to compute cores that reach a high logic cell utilization, the part of the chip where the NoC sits usually attains a rather low cell utilization since NoCs are dominated by routing.<br />
We could take advantage of the otherwise unused regions of the chip where the NoC sits by instantiating a latch-based standard-cell memory (SCM) as a scratchpad memory (SPM) directly addressable by the NoC.<br />
<br />
= Project =<br />
<br />
The goal of this thesis is to develop a prototype that interleaves SCM (which is dominated by logic cells) and the NoC (which is dominated by routing) in order to maximize the utilization of both logic cells and routing. The thesis should explore the benefit of memory-in-NoC by showing how much additional memory we can add without significantly increasing the total chip area.<br />
<br />
== Character ==<br />
<br />
* 30% Architecture Specification<br />
* 40% Implementation of the architecture<br />
* 30% SCM in NoC exploration<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with the System Verilog language, VLSI 1<br />
* Experience with physical implementation, VLSI 2<br />
<br />
= References =</div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=High_Performance_SoCs&diff=7956High Performance SoCs2022-08-12T09:03:04Z<p>Fischeti: /* Who are we */</p>
<hr />
<div>==High-Performance Systems-on-Chip==<br />
<br />
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]<br />
[[File:Floorplan_baikonur.png|thumb|350px|''Baikonur'', a 22 nm chip integrating two application-grade RISC-V Ariane cores and 3 Snitch clusters with 8 cores each.]]<br />
[[File:Manticore_concept.png|thumb|350px|Concept art for ''Manticore'', a Snitch-based 22 nm system with 4096 cores on multiple chiplets and with HBM2 memory.]]<br />
<br />
Today, a multitude of data-driven applications such as machine learning, scientific computing, and big data demand an ever-increasing amount of '''parallel floating-point performance''' from computing systems. Increasingly, such applications must scale across a wide range of applications and energy budgets, from supercomputers simulating next week's weather to your smartphone cameras correcting for low light conditions.<br />
<br />
This brings challenges on multiple fronts:<br />
<br />
* '''Energy Efficiency''' becomes a major concern: As logic density increases, supplying these systems with energy and managing their heat dissipation requires increasingly complex solutions.<br />
<br />
* '''Memory bandwidth and latency''' become a major bottleneck as the amount of processed data increases. Despite continuous advances, memory lags behind computing in scaling, and many data-driven problems today are memory-bound.<br />
<br />
* '''Parallelization and scaling''' bring challenges of their own: on-chip interconnects may introduce significant area and performance overheads as they grow, and both the data and instruction streams of cores may compete for valuable memory bandwidth and interfere in a destructive way.<br />
<br />
While all state-of-the-art high-performance computing systems are constrained by the above issues, they are also subject to a fundamental trade-off between efficiency and flexibility. This forms a design space which includes the following paradigms:<br />
<br />
* '''Accelerators''' are designed to do one thing very well: they are very energy efficient and performant and usually offer predetermined data movement. However, they are not or barely programmable, inflexible, and monolithic in their design.<br />
<br />
* '''Superscalar Out-of-Order CPUs''', on the other end, provide extreme flexibility, full programmability, and reasonable performance across various workloads. However, they require large area and energy overheads for a given performance, use memory inefficiently, and are often hard to scale well to manycore systems.<br />
<br />
* '''GPUs''' are parallel and data-oriented by design, yet still meaningfully programmable, aiming for a sweet-spot between scalability, efficiency, and programmability. However, are still subject to memory access challenges and often require manual memory management for decent performance.<br />
<br />
'''How can we further improve on these existing paradigms?''' Can we design decently efficient and performant, yet freely programmable systems with scalable, performant memory systems?<br />
<br />
If these questions sound intriguing to you, consider joining us for a project or thesis! You can find currently available projects and our contact information below.<br />
<br />
==Our Activities==<br />
<br />
We are primarily interested in '''architecture design and hardware implementation''' for high-performance systems. However, ensuring high performance requires us to consider the '''entire hardware-software stack''':<br />
<br />
* '''HPC Software''': Design and porting of high-performance applications, benchmarks, compiler tools, and operating systems (Linux) to our hardware.<br />
* '''Hardware-software codesign''': Design of performance-aware algorithms and kernels and hardware that can be efficiently programmed for use in processor-based systems.<br />
* '''Architecture''': RTL implementation of energy-efficient designs with an emphasis on high utilization and throughput, as well as on efficient interoperability with existing IPs.<br />
* '''SoC design and Implementation''': Design of full high-performance systems-on-chips; implementation and tapeout on modern silicon technologies such as TSMC's 65 nm and GlobalFoundries' 22 nm nodes.<br />
* '''IC testing and Board-Level design''': Testing of the returning chips with industry-grade automated test equipment (ATE) and design of system-level demonstrator boards.<br />
<br />
Our current interests include systems with '''low control-to-compute ratios''', high-performance '''on-chip interconnects''', and '''scalable many-core systems'''. However, we are always happy to explore new domains; if you have an interesting idea, contact us and we can discuss it in detail!<br />
<br />
==Who are we==<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Smazzola_face_1to1.png|frameless|left|96px]]<br />
|<br />
===[[:User:Smazzola | Sergio Mazzola]]===<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Paulsc_face_1to1.png|frameless|left|96px]]<br />
|<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Tbenz_face_pulp_team.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Tbenz | Thomas Benz]]===<br />
* '''e-mail''': [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 05 18<br />
* '''office''': ETZ J85<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Nwistoff_face_pulp_team.JPG|frameless|left|96px]]<br />
|<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:lbertaccini_photo.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Lbertaccini | Luca Bertaccini]]===<br />
* '''e-mail''': [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 55 58<br />
* '''office''': ETZ J78<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Mperotti_face_pulp_team.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Mperotti | Matteo Perotti]]===<br />
* '''e-mail''': [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 05 25<br />
* '''office''': ETZ J85<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Sriedel_face_pulp_team.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Sriedel | Samuel Riedel]]===<br />
* '''e-mail''': [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 65 69<br />
* '''office''': ETZ J71.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Matheusd_face_1to1.png|frameless|left|96px]]<br />
|<br />
===[[:User:Matheusd | Matheus Cavalcante]]===<br />
* '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 96<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Tim_Fischer.jpeg|frameless|left|96px]]<br />
|<br />
<br />
===[[:User:Fischeti| Tim Fischer]]===<br />
* '''e-mail''': [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 59 12<br />
* '''office''': ETZ J76.2<br />
|}<br />
<br />
<!--Retired members<br />
{|<br />
| style="padding: 10px" | [[File:Akurth_face_pulp_team.jpeg|frameless|left|96px]]<br />
|<br />
===[[:User:Akurth | Andreas Kurth]]===<br />
* '''e-mail''': [mailto:akurth@iis.ee.ethz.ch akurth@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 04 87<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Zarubaf_face_pulp_team.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Zarubaf | Florian Zaruba]]===<br />
* '''e-mail''': [mailto:zarubaf@iis.ee.ethz.ch zarubaf@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 65 56<br />
* '''office''': ETZ J89<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Fschuiki_face_pulp_team.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Fschuiki | Fabian Schuiki]]===<br />
* '''e-mail''': [mailto:fschuiki@iis.ee.ethz.ch fschuiki@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 67 89<br />
* '''office''': ETZ J89<br />
|}<br />
--><br />
<br />
<!--<br />
Who are we<br />
What do we do<br />
Where to find us<br />
--><br />
<br />
==Projects==<br />
<br />
All projects are annotated with one or more possible ''project types'' (M/S/B/G) and a ''number of students'' (1 to 3). <br />
<br />
* '''M''': Master's thesis: ''26 weeks'' full-time (6 months) for ''one student only''<br />
* '''S''': Semester project: ''14 weeks'' half-time (1 semester lecture period) or ''7 weeks'' full-time for ''1-3 students''<br />
* '''B''': Bachelor's thesis: ''14 weeks'' half-time (1 semester lecture period) for ''one student only''<br />
* '''G''': Group project: ''14 weeks'' part-time (1 semester lecture period) for ''2-3 students''<br />
<br />
Usually, these are merely suggestions from our side; proposals can often be reformulated to fit students' needs.<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
ordermethod=sortkey<br />
order=ascending<br />
</DynamicPageList><br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=false<br />
ordermethod=sortkey<br />
order=ascending<br />
</DynamicPageList><br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
</DynamicPageList><br />
===Reserved Projects===<br />
<DynamicPageList><br />
category = Reserved<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
ordermethod=sortkey<br />
order=ascending<br />
</DynamicPageList></div>Fischetihttp://iis-projects.ee.ethz.ch/index.php?title=User:Fischeti&diff=7235User:Fischeti2021-11-19T16:18:49Z<p>Fischeti: Created page with "280px == Tim Fischer == I received my Bachelor's degree in Information Technology and Electrical Engineering from Swiss Federal Institut..."</p>
<hr />
<div>[[File:Tim_Fischer.jpeg|thumb|right|280px]]<br />
== Tim Fischer ==<br />
I received my Bachelor's degree in Information Technology and Electrical Engineering from Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland in 2018 and my Master's degree in April 2020. After that, I started as a PhD Student in the digital circuits and systems group of Prof. Dr. L. Benini<br />
<br />
==Interests==<br />
My research focus is on deploying Machine Learning Workloads on High-Performance Computing systems. Specifically I am interested in HW/SW Co-design of DNN Training algorithms as well as low-precision floating point DNN training. I have also previously worked onML Hardware Accelerator for edge applications.<br />
<br />
<br />
<br />
==Contact Information==<br />
* '''Office''': ETZ J 76.2<br />
* '''e-mail''': [mailto:fischeti@iis.ee.ethz.ch fischeti@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 59 12<br />
* '''www''': [https://iis.ee.ethz.ch/people/person-detail.MjE3MzI0.TGlzdC8zOTg3LDk5MDE4ODk4MA==.html IIS Homepage]<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Fischeti<br />
</DynamicPageList><br />
<br />
[[Category: Supervisors]]<br />
[[Category: Digital]]<br />
[[Category: Deep Learning Acceleration]]<br />
[[Category: Hardware Acceleration]]</div>Fischeti