http://iis-projects.ee.ethz.ch/api.php?action=feedcontributions&user=Matheusd&feedformat=atomiis-projects - User contributions [en]2024-03-29T07:01:48ZUser contributionsMediaWiki 1.28.0http://iis-projects.ee.ethz.ch/index.php?title=Spatz_grows_wings:_Physical_Implementation_of_a_Vector-Powered_Manycore_System_(2S)&diff=7845Spatz grows wings: Physical Implementation of a Vector-Powered Manycore System (2S)2022-07-08T09:02:25Z<p>Matheusd: </p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Ridedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
Striving for high image quality, even on mobile devices, has led to an increase in pixel count in smartphone cameras over the last decade. These image sensors, boasting tens of millions of pixels, create a massive amount of data to be processed on a tight power envelope as quickly as possible. While this processing is highly parallelizable, it requires specialized ISPs which can exploit this high degree of parallelism to meet the timing and power constraints. One modern example of such an ISP is Google’s Pixel Visual Core, which contains eight image processing units, each consisting of 256 specialized processing elements to achieve a combined performance of 3.28 TOPS. <br />
<br />
At ETH, we developed our own many-core system called MemPool. It boasts 256 lightweight 32-bit Snitch cores. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads.<br />
<br />
In the quest to improve MemPool's peak performance, last semester, we proposed Spatz, an embedded vector processing unit based on the integer subset of the RISC-V Vector Extension. Spatz heavily improved MemPool's performance and energy efficiency. For an area increase of 26%, we increased MemPool's performance by 70% and its energy efficiency by 116%. This shows the relevance of small embedded vector processing units as the PE of large-scale clusters with tightly-coupled L1 memory.<br />
<br />
= Project description =<br />
<br />
The goal of this project is to explore how well Spatz performs on MemPool's small-scale configuration, MinPool.<br />
We want to tape-out a Spatz-powered MinPool configuration that allows us to characterize the power, performance, and efficiency of the standalone Spatz unit and of the MinPool cluster. <br />
<br />
Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people.<br />
Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.<br />
<br />
The whole project can be subdivided into four main phases, which are described in detail in the following:<br />
<br />
=== Familiarizing with the design ===<br />
<br />
MemPool has been the work of two advisors for this thesis for a few years. <br />
As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.<br />
<br />
The student(s) should also familiarize themselves with Spatz, the work of Domenic Wüthrich which led to a conference publication---which is the main reference to the design.<br />
<br />
=== Implementing MemPool's tile ===<br />
<br />
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle.<br />
This module is the basis for building the MemPool cluster.<br />
It is, however, a ``normal'' module, which can be placed and routed as a rather dense macro.<br />
<br />
We need to determine how fast the Spatz-powered MemPool's tiles can be clocked at the target technology node, also considering that they should be rather dense.<br />
The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile.<br />
This should also give the student a feel for the next step. <br />
<br />
The student(s) might also try out a datapath-driven flow to better implement Spatz's latch-based VRF, which is a considerable part of the design.<br />
<br />
=== Implementing MinPool ===<br />
<br />
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster. We do not know how many of such tiles we can instantiate before degrading timing or latency too much.<br />
<br />
After this design exploration phase, the student and the advisors will decide on a final set of parameters for MinPool, finally deciding how large should it be. <br />
<br />
=== Making MinPool standalone ===<br />
<br />
During this phase, some RTL is needed to make MinPool a stand-alone design. The student will need to integrate an FLL, a boot ROM, and a JTAG to access and initialize the system. While there are many IPs and know-how at IIS for that, this is also highly dependent on the final design parameters for MinPool. <br />
<br />
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks. <br />
<br />
=== Run sanity tests on the place-and-routed design ===<br />
<br />
Run DRC checks, simulate the place-and-routed design, and reason about the testing strategy.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:ASIC]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Mperotti]]<br />
[[Category:Available]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Spatz_grows_wings:_Physical_Implementation_of_a_Vector-Powered_Manycore_System_(2S)&diff=7841Spatz grows wings: Physical Implementation of a Vector-Powered Manycore System (2S)2022-06-30T08:11:50Z<p>Matheusd: Created page with "= Overview = == Status: Available == * Type: Semester Thesis * Professor: Prof. Dr. L. Benini * Supervisors: ** Matheus Cavalcante: [mailto:matheusd@iis..."</p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Ridedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
Striving for high image quality, even on mobile devices, has led to an increase in pixel count in smartphone cameras over the last decade. These image sensors, boasting tens of millions of pixels, create a massive amount of data to be processed on a tight power envelope as quickly as possible. While this processing is highly parallelizable, it requires specialized ISPs which can exploit this high degree of parallelism to meet the timing and power constraints. One modern example of such an ISP is Google’s Pixel Visual Core, which contains eight image processing units, each consisting of 256 specialized processing elements to achieve a combined performance of 3.28 TOPS. <br />
<br />
At ETH, we developed our own many-core system called MemPool. It boasts 256 lightweight 32-bit Snitch cores. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads.<br />
<br />
In the quest to improve MemPool's peak performance, last semester, we proposed Spatz, an embedded vector processing unit based on the integer subset of the RISC-V Vector Extension. Spatz heavily improved MemPool's performance and energy efficiency. For an area increase of 26%, we increased MemPool's performance by 70% and its energy efficiency by 116%. This shows the relevance of small embedded vector processing units as the PE of large-scale clusters with tightly-coupled L1 memory.<br />
<br />
= Project description =<br />
<br />
The goal of this project is to explore how well Spatz performs on MemPool's small-scale configuration, MinPool.<br />
We want to tape-out a Spatz-powered MinPool configuration that allows us to characterize the power, performance, and efficiency of the standalone Spatz unit and of the MinPool cluster. <br />
<br />
Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people.<br />
Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.<br />
<br />
The whole project can be subdivided into four main phases, which are described in detail in the following:<br />
<br />
=== Familiarizing with the design ===<br />
<br />
MemPool has been the work of two advisors for this thesis for a few years. <br />
As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.<br />
<br />
The student(s) should also familiarize themselves with Spatz, the work of Domenic Wüthrich which led to a conference publication---which is the main reference to the design.<br />
<br />
=== Implementing MemPool's tile ===<br />
<br />
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle.<br />
This module is the basis for building the MemPool cluster.<br />
It is, however, a ``normal'' module, which can be placed and routed as a rather dense macro.<br />
<br />
We need to determine how fast the Spatz-powered MemPool's tiles can be clocked at the target technology node, also considering that they should be rather dense.<br />
The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile.<br />
This should also give the student a feel for the next step. <br />
<br />
The student(s) might also try out a datapath-driven flow to better implement Spatz's latch-based VRF, which is a considerable part of the design.<br />
<br />
=== Implementing MinPool ===<br />
<br />
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster. We do not know how many of such tiles we can instantiate before degrading timing or latency too much.<br />
<br />
After this design exploration phase, the student and the advisors will decide on a final set of parameters for MinPool, finally deciding how large should it be. <br />
<br />
=== Making MinPool standalone ===<br />
<br />
During this phase, some RTL is needed to make MinPool a stand-alone design. The student will need to integrate an FLL, a boot ROM, and a JTAG to access and initialize the system. While there are many IPs and know-how at IIS for that, this is also highly dependent on the final design parameters for MinPool. <br />
<br />
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks. <br />
<br />
=== Run sanity tests on the place-and-routed design ===<br />
<br />
Run DRC checks, simulate the place-and-routed design, and reason about the testing strategy.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Ara]]<br />
[[Category:ASIC]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Mperotti]]<br />
[[Category:Available]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Small_and_Energy-Efficient_RISC-V-based_Vector_Accelerator_(1M)&diff=7840Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M)2022-06-30T08:00:28Z<p>Matheusd: </p>
<hr />
<div><!-- Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M) --><br />
<br />
= Overview =<br />
<br />
== Status: In progress ==<br />
<br />
* Type: Master's Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperottil@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
== Prerequisites ==<br />
<br />
* VLSI I<br />
* SoCDAML (recommended)<br />
* Experience with SystemVerilog<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Heterogeneous Acceleration Systems]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Matheusd]]<br />
[[Category:Mperotti]]<br />
[[Category:Sriedel]]<br />
[[Category:Completed]]<br />
<br />
= Introduction =<br />
<br />
Striving for high image quality, even on mobile devices, has led to an increase in pixel count in smartphone cameras over the last decade. <br />
These image sensors, boasting tens of millions of pixels, create a massive amount of data to be processed on a tight power envelope as quickly as possible. <br />
While this processing is highly parallelizable, it requires specialized Image Signal Processors (ISPs) which can exploit this high degree of parallelism to meet the timing and power constraints.<br />
One modern example of such an ISP is Google's Pixel Visual Core, which contains eight image processing units, each consisting of 256 specialized processing elements to achieve a combined performance of 3.28 TOPS. <br />
<br />
At ETH, we are developing our own many-core system called MemPool. It boasts 256 lightweight 32-bit Snitch cores. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads.<br />
<br />
Programming MemPool presents some challenges.<br />
Even though the scratchpad memory can be accessed within at most five cycles, memory banks close to the cores can be accessed with lower latency.<br />
It is therefore beneficial to keep the cores' accesses local, reducing the latency and the load on the global interconnect.<br />
We acknowledged this through a hybrid memory addressing scheme, which allocates each cores' stack on a memory bank close to it, accessible within one cycle of latency.<br />
<br />
We also explored programming MemPool with a systolic array, transforming it into a Coarse-Grained Reconfigurable Architecture (CGRA).<br />
This approach instantiates queues between cores, which privileges communication between neighboring cores.<br />
Through the addition of special push and pop instructions, similar to SSRs, we can also elide some memory loads and stores, alleviating the Von Neumann bottleneck. <br />
<br />
A vector programming model can be also used to program MemPool.<br />
We can exploit the fact that each vector instruction can be translated into a long series of scalar micro-operations.<br />
By replicating such micro-operations, we can alleviate the pressure on the instruction issue of the scalar core, leaving it free to execute other instructions.<br />
<br />
==== Goal ====<br />
<br />
This thesis' goal is to develop a small and energy vector accelerator unit, and integrate it with MemPool.<br />
This unit should achieve high performance on key computational photography kernels, while keeping the energy efficiency of the design under control.<br />
This manycore system with vector support is to be analyzed in terms of the performance improvements, power requirements, and area impacts of the hardware needed to implement the vector accelerator.<br />
<br />
= Project Description =<br />
<br />
The project has different aspects to be explored. <br />
First, we need to determine a small subset of RISC-V's Vector Extension to be implemented. <br />
If the Vector Extension has been ratified and proposes an instruction subset for small embedded subsystems, we can use it.<br />
Otherwise, we will base ourselves on the instructions needed to execute the software kernels of interest.<br />
<br />
Then, we will investigate how to implement this small and energy-efficient vector accelerator.<br />
We can take inspiration on Ara, a RISC-V-based vector processor developed by our group.<br />
Keep in mind that Ara targets much higher operating frequencies, and is overall a complex vector machine---each lane of Ara is about as large as one tile of MemPool.<br />
We want to implement a simple, small, and energy-efficient vector unit instead.<br />
<br />
Regarding the vector register file, we might either use a small vector register file per vector unit or stream the operands from the local L1 memory.<br />
During the thesis, the student will be asked to evaluate both approaches, and implement the chosen one.<br />
The other approach, and the comparison between them, is taken as a stretch goal.<br />
<br />
= Milestones =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
<br />
* Familiarize yourself with MemPool and with the RISC-V Vector Extension<br />
* Choose a subset of interest of the RISC-V Vector Extension<br />
* Implement a vector unit and integrate it with the Snitch cores in the MemPool tile<br />
* Benchmark the performance of the design with vector kernels<br />
* Analyze the impacts of the vector support on the area and on power consumption<br />
* Compare your solution with MemPool as a systolic array<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 20 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports<br />
<br />
== References ==<br />
<br />
[1] O.Shachamand, M.Reynders, "Pixel Visual Core: image processing and machine learning on Pixel 2," oct 2017. [Online]. Available: https://www.blog.google/products/pixel/pixel-visual-core-image-processing-and-machine-learning-pixel-2/<br />
<br />
[2] M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, "MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect," dec 2020. [Online]. Available: http://arxiv.org/abs/2012.02973<br />
<br />
[3] S. Riedel and M. Cavalcante, "MemPool GitHub," 2021. [Online]. Available: https://github.com/pulp-platform/mempool<br />
<br />
[4] F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini, "Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads," IEEE TRANSACTIONS ON COMPUTERS, pp.1–1, feb 2020. [Online]. Available: http://arxiv.org/abs/2002.10143<br />
<br />
[5] A. Waterman and K. Asanovic, "The RISC-V Instruction Set Manual Volume I: Unprivileged ISA - Document Version 20191213," RISC-V Foundation, Tech. Rep., 2019. [Online]. Available: https://github.com/riscv/riscv-isa-manual/releases/download/draft-20201002-db3eeaf/riscv-spec.pdf<br />
<br />
[6] M. Cavalcante, F. Schuiki, F. Zaruba, M. Schaffner, and L. Benini, "Ara: A 1-GHz+ scalable and energy-efficient RISC-V vector processor with multiprecision floating-point support in 22nm FD-SOI," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 2, pp. 530–543, 2020.</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Small_and_Energy-Efficient_RISC-V-based_Vector_Accelerator_(1M)&diff=7060Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M)2021-10-22T07:36:27Z<p>Matheusd: </p>
<hr />
<div><!-- Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M) --><br />
<br />
= Overview =<br />
<br />
== Status: In progress ==<br />
<br />
* Type: Master's Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperottil@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
== Prerequisites ==<br />
<br />
* VLSI I<br />
* SoCDAML (recommended)<br />
* Experience with SystemVerilog<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Heterogeneous Acceleration Systems]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Matheusd]]<br />
[[Category:Mperotti]]<br />
[[Category:Sriedel]]<br />
[[Category:In progress]]<br />
<br />
= Introduction =<br />
<br />
Striving for high image quality, even on mobile devices, has led to an increase in pixel count in smartphone cameras over the last decade. <br />
These image sensors, boasting tens of millions of pixels, create a massive amount of data to be processed on a tight power envelope as quickly as possible. <br />
While this processing is highly parallelizable, it requires specialized Image Signal Processors (ISPs) which can exploit this high degree of parallelism to meet the timing and power constraints.<br />
One modern example of such an ISP is Google's Pixel Visual Core, which contains eight image processing units, each consisting of 256 specialized processing elements to achieve a combined performance of 3.28 TOPS. <br />
<br />
At ETH, we are developing our own many-core system called MemPool. It boasts 256 lightweight 32-bit Snitch cores. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads.<br />
<br />
Programming MemPool presents some challenges.<br />
Even though the scratchpad memory can be accessed within at most five cycles, memory banks close to the cores can be accessed with lower latency.<br />
It is therefore beneficial to keep the cores' accesses local, reducing the latency and the load on the global interconnect.<br />
We acknowledged this through a hybrid memory addressing scheme, which allocates each cores' stack on a memory bank close to it, accessible within one cycle of latency.<br />
<br />
We also explored programming MemPool with a systolic array, transforming it into a Coarse-Grained Reconfigurable Architecture (CGRA).<br />
This approach instantiates queues between cores, which privileges communication between neighboring cores.<br />
Through the addition of special push and pop instructions, similar to SSRs, we can also elide some memory loads and stores, alleviating the Von Neumann bottleneck. <br />
<br />
A vector programming model can be also used to program MemPool.<br />
We can exploit the fact that each vector instruction can be translated into a long series of scalar micro-operations.<br />
By replicating such micro-operations, we can alleviate the pressure on the instruction issue of the scalar core, leaving it free to execute other instructions.<br />
<br />
==== Goal ====<br />
<br />
This thesis' goal is to develop a small and energy vector accelerator unit, and integrate it with MemPool.<br />
This unit should achieve high performance on key computational photography kernels, while keeping the energy efficiency of the design under control.<br />
This manycore system with vector support is to be analyzed in terms of the performance improvements, power requirements, and area impacts of the hardware needed to implement the vector accelerator.<br />
<br />
= Project Description =<br />
<br />
The project has different aspects to be explored. <br />
First, we need to determine a small subset of RISC-V's Vector Extension to be implemented. <br />
If the Vector Extension has been ratified and proposes an instruction subset for small embedded subsystems, we can use it.<br />
Otherwise, we will base ourselves on the instructions needed to execute the software kernels of interest.<br />
<br />
Then, we will investigate how to implement this small and energy-efficient vector accelerator.<br />
We can take inspiration on Ara, a RISC-V-based vector processor developed by our group.<br />
Keep in mind that Ara targets much higher operating frequencies, and is overall a complex vector machine---each lane of Ara is about as large as one tile of MemPool.<br />
We want to implement a simple, small, and energy-efficient vector unit instead.<br />
<br />
Regarding the vector register file, we might either use a small vector register file per vector unit or stream the operands from the local L1 memory.<br />
During the thesis, the student will be asked to evaluate both approaches, and implement the chosen one.<br />
The other approach, and the comparison between them, is taken as a stretch goal.<br />
<br />
= Milestones =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
<br />
* Familiarize yourself with MemPool and with the RISC-V Vector Extension<br />
* Choose a subset of interest of the RISC-V Vector Extension<br />
* Implement a vector unit and integrate it with the Snitch cores in the MemPool tile<br />
* Benchmark the performance of the design with vector kernels<br />
* Analyze the impacts of the vector support on the area and on power consumption<br />
* Compare your solution with MemPool as a systolic array<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 20 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports<br />
<br />
== References ==<br />
<br />
[1] O.Shachamand, M.Reynders, "Pixel Visual Core: image processing and machine learning on Pixel 2," oct 2017. [Online]. Available: https://www.blog.google/products/pixel/pixel-visual-core-image-processing-and-machine-learning-pixel-2/<br />
<br />
[2] M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, "MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect," dec 2020. [Online]. Available: http://arxiv.org/abs/2012.02973<br />
<br />
[3] S. Riedel and M. Cavalcante, "MemPool GitHub," 2021. [Online]. Available: https://github.com/pulp-platform/mempool<br />
<br />
[4] F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini, "Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads," IEEE TRANSACTIONS ON COMPUTERS, pp.1–1, feb 2020. [Online]. Available: http://arxiv.org/abs/2002.10143<br />
<br />
[5] A. Waterman and K. Asanovic, "The RISC-V Instruction Set Manual Volume I: Unprivileged ISA - Document Version 20191213," RISC-V Foundation, Tech. Rep., 2019. [Online]. Available: https://github.com/riscv/riscv-isa-manual/releases/download/draft-20201002-db3eeaf/riscv-spec.pdf<br />
<br />
[6] M. Cavalcante, F. Schuiki, F. Zaruba, M. Schaffner, and L. Benini, "Ara: A 1-GHz+ scalable and energy-efficient RISC-V vector processor with multiprecision floating-point support in 22nm FD-SOI," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 2, pp. 530–543, 2020.</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Small_and_Energy-Efficient_RISC-V-based_Vector_Accelerator_(1M)&diff=6806Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M)2021-08-04T10:54:26Z<p>Matheusd: </p>
<hr />
<div><!-- Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M) --><br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master's Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperottil@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
<br />
== Prerequisites ==<br />
<br />
* VLSI I<br />
* SoCDAML (recommended)<br />
* Experience with SystemVerilog<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Heterogeneous Acceleration Systems]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Mperotti]]<br />
[[Category:Available]]<br />
<br />
= Introduction =<br />
<br />
Striving for high image quality, even on mobile devices, has lead to an increase in pixel count in smartphone cameras over the last decade. <br />
These image sensors, boasting tens of millions of pixels, create a massive amount of data to be processed on a tight power envelope as quickly as possible. <br />
While this processing is highly parallelizable, it requires specialized Image Signal Processors (ISPs) which can exploit this high degree of parallelism to meet the timing and power constraints.<br />
One modern example of such an ISP is Google's Pixel Visual Core, which contains eight image processing units, each consisting of 256 specialized processing elements to achieve a combined performance of 3.28 TOPS. <br />
<br />
At ETH, we are developing our own many-core system called MemPool. It boasts 256 lightweight 32-bit Snitch cores. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads.<br />
<br />
Programming MemPool presents some challenges.<br />
Even though the scratchpad memory can be accessed within at most five cycles, memory banks close to the cores can be accessed with a lower latency.<br />
It is therefore beneficial to keep the cores' accesses local, reducing the latency and the load on the global interconnect.<br />
We acknowledged this through a hybrid memory addressing scheme, which allocates each cores' stack on a memory bank close to it, accessible within one cycle of latency.<br />
<br />
We also explored programming MemPool with a systolic array, transforming it into a Coarse-Grained Reconfigurable Architecture (CGRA).<br />
This approach instantiates queues between cores, which privileges communication between neighboring cores.<br />
Through the addition of special push and pop instructions, similar to SSRs, we can also elide some memory loads and stores, alleviating the Von Neumann bottleneck. <br />
<br />
A vector programming model can be also used to program MemPool.<br />
We can exploit the fact that each vector instruction can be translated into a long series of scalar micro-operations.<br />
By replicating such micro-operations, we can alleviate the pressure on the instruction issue of the scalar core, leaving it free to execute other instructions.<br />
<br />
==== Goal ====<br />
<br />
This thesis' goal is to develop a small and energy vector accelerator unit, and integrate it with MemPool.<br />
This unit should achieve a high performance on key computational photography kernels, while keeping the energy efficiency of the design under control.<br />
This manycore system with vector support is to be analyzed in terms of the performance improvements, power requirements, and area impacts of the hardware needed to implement the vector accelerator.<br />
<br />
= Project Description =<br />
<br />
The project has different aspects to be explored. <br />
First, we need to determine a small subset of RISC-V's Vector Extension to be implemented. <br />
If the Vector Extension has been ratified and proposes an instruction subset for small embedded subsystems, we can use it.<br />
Otherwise, we will base ourselves on the instructions needed to execute the software kernels of interest.<br />
<br />
Then, we will investigate how to implement this small and energy-efficient vector accelerator.<br />
We can take inspiration on Ara, a RISC-V-based vector processor developed by our group.<br />
Keep in mind that Ara targets much higher operating frequencies, and is overall a complex vector machine---each lane of Ara is about as large as one tile of MemPool.<br />
We want to implement a simple, small, and energy-efficient vector unit instead.<br />
<br />
Regarding the vector register file, we might either use a small vector register file per vector unit, or stream the operands from the local L1 memory.<br />
During the thesis, the student will be asked to evaluate both approaches, and implement the chosen one.<br />
The other approach, and the comparison between them, is taken as a stretch goal.<br />
<br />
= Milestones =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
<br />
* Familiarize yourself with MemPool and with the RISC-V Vector Extension<br />
* Choose a subset of interest of the RISC-V Vector Extension<br />
* Implement a vector unit and integrate it with the Snitch cores in the MemPool tile<br />
* Benchmark the performance of the design with vector kernels<br />
* Analyze the impacts of the vector support on area and on power consumption<br />
* Compare your solution with MemPool as a systolic array<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Beside these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 20 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports<br />
<br />
== References ==<br />
<br />
[1] O.Shachamand, M.Reynders, "Pixel Visual Core: image processing and machine learning on Pixel 2," oct 2017. [Online]. Available: https://www.blog.google/products/pixel/pixel-visual-core-image-processing-and-machine-learning-pixel-2/<br />
<br />
[2] M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, "MemPool: A Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect," dec 2020. [Online]. Available: http://arxiv.org/abs/2012.02973<br />
<br />
[3] S. Riedel and M. Cavalcante, "MemPool GitHub," 2021. [Online]. Available: https://github.com/pulp-platform/mempool<br />
<br />
[4] F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini, "Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads," IEEE TRANSACTIONS ON COMPUTERS, pp.1–1, feb 2020. [Online]. Available: http://arxiv.org/abs/2002.10143<br />
<br />
[5] A. Waterman and K. Asanovic, "The RISC-V Instruction Set Manual Volume I: Unprivileged ISA - Document Version 20191213," RISC-V Foundation, Tech. Rep., 2019. [Online]. Available: https://github.com/riscv/riscv-isa-manual/releases/download/draft-20201002-db3eeaf/riscv-spec.pdf<br />
<br />
[6] M. Cavalcante, F. Schuiki, F. Zaruba, M. Schaffner, and L. Benini, "Ara: A 1-GHz+ scalable and energy-efficient RISC-V vector processor with multiprecision floating-point support in 22nm FD-SOI," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 2, pp. 530–543, 2020.</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Small_and_Energy-Efficient_RISC-V-based_Vector_Accelerator_(1M)&diff=6805Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M)2021-08-04T10:49:38Z<p>Matheusd: </p>
<hr />
<div><!-- Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M) --><br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master's Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperottil@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
<br />
== Prerequisites ==<br />
<br />
* VLSI I<br />
* SoCDAML (recommended)<br />
* Experience with SystemVerilog<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Heterogeneous Acceleration Systems]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Mperotti]]<br />
[[Category:Available]]<br />
<br />
= Introduction =<br />
<br />
Striving for high image quality, even on mobile devices, has lead to an increase in pixel count in smartphone cameras over the last decade. <br />
These image sensors, boasting tens of millions of pixels, create a massive amount of data to be processed on a tight power envelope as quickly as possible. <br />
While this processing is highly parallelizable, it requires specialized Image Signal Processors (ISPs) which can exploit this high degree of parallelism to meet the timing and power constraints.<br />
One modern example of such an ISP is Google's Pixel Visual Core, which contains eight image processing units, each consisting of 256 specialized processing elements to achieve a combined performance of 3.28 TOPS. <br />
<br />
At ETH, we are developing our own many-core system called MemPool. It boasts 256 lightweight 32-bit Snitch cores. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads.<br />
<br />
Programming MemPool presents some challenges.<br />
Even though the scratchpad memory can be accessed within at most five cycles, memory banks close to the cores can be accessed with a lower latency.<br />
It is therefore beneficial to keep the cores' accesses local, reducing the latency and the load on the global interconnect.<br />
We acknowledged this through a hybrid memory addressing scheme, which allocates each cores' stack on a memory bank close to it, accessible within one cycle of latency.<br />
<br />
We also explored programming MemPool with a systolic array, transforming it into a Coarse-Grained Reconfigurable Architecture (CGRA).<br />
This approach instantiates queues between cores, which privileges communication between neighboring cores.<br />
Through the addition of special push and pop instructions, similar to SSRs, we can also elide some memory loads and stores, alleviating the Von Neumann bottleneck. <br />
<br />
A vector programming model can be also used to program MemPool.<br />
We can exploit the fact that each vector instruction can be translated into a long series of scalar micro-operations.<br />
By replicating such micro-operations, we can alleviate the pressure on the instruction issue of the scalar core, leaving it free to execute other instructions.<br />
<br />
==== Goal ====<br />
<br />
This thesis' goal is to develop a small and energy vector accelerator unit, and integrate it with MemPool.<br />
This unit should achieve a high performance on key computational photography kernels, while keeping the energy efficiency of the design under control.<br />
This manycore system with vector support is to be analyzed in terms of the performance improvements, power requirements, and area impacts of the hardware needed to implement the vector accelerator.<br />
<br />
= Project Description =<br />
<br />
The project has different aspects to be explored. <br />
First, we need to determine a small subset of RISC-V's Vector Extension to be implemented. <br />
If the Vector Extension has been ratified and proposes an instruction subset for small embedded subsystems, we can use it.<br />
Otherwise, we will base ourselves on the instructions needed to execute the software kernels of interest.<br />
<br />
Then, we will investigate how to implement this small and energy-efficient vector accelerator.<br />
We can take inspiration on Ara, a RISC-V-based vector processor developed by our group.<br />
Keep in mind that Ara targets much higher operating frequencies, and is overall a complex vector machine---each lane of Ara is about as large as one tile of MemPool.<br />
We want to implement a simple, small, and energy-efficient vector unit instead.<br />
<br />
Regarding the vector register file, we might either use a small vector register file per vector unit, or stream the operands from the local L1 memory.<br />
During the thesis, the student will be asked to evaluate both approaches, and implement the chosen one.<br />
The other approach, and the comparison between them, is taken as a stretch goal.<br />
<br />
= Milestones =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
<br />
* Familiarize yourself with MemPool and with the RISC-V Vector Extension<br />
* Choose a subset of interest of the RISC-V Vector Extension<br />
* Implement a vector unit and integrate it with the Snitch cores in the MemPool tile<br />
* Benchmark the performance of the design with vector kernels<br />
* Analyze the impacts of the vector support on area and on power consumption<br />
* Compare your solution with MemPool as a systolic array<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Beside these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 20 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Small_and_Energy-Efficient_RISC-V-based_Vector_Accelerator_(1M)&diff=6804Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M)2021-08-04T10:45:42Z<p>Matheusd: Created page with "<!-- Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M) --> = Overview = == Status: Available == * Type: Master's Thesis * Professor: Prof...."</p>
<hr />
<div><!-- Implementation of a Small and Energy-Efficient RISC-V-based Vector Accelerator (1M) --><br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master's Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperottil@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
<br />
== Prerequisites ==<br />
<br />
* VLSI I<br />
* SoCDAML (recommended)<br />
* Experience with SystemVerilog<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Heterogeneous Acceleration Systems]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Mperotti]]<br />
[[Category:Available]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Manycore_System_on_FPGA_(M/S/G)&diff=6490Manycore System on FPGA (M/S/G)2021-03-17T11:53:01Z<p>Matheusd: </p>
<hr />
<div><!-- Manycore System on FPGA --><br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Enis Mustafa<br />
* Type: Bachelor/Semester/Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* Software<br />
* RTL design<br />
* FPGA design<br />
<br />
== Prerequisites ==<br />
<br />
* VLSI I (recommended)<br />
* Experience with C<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:MemPool]]<br />
[[Category:FPGA]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Bachelor Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Sriedel]]<br />
[[Category:Matheusd]]<br />
[[Category:In_progress]]<br />
<br />
= Introduction =<br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems. Those systems integrate many small cores (hundreds, thousands) that work independently to execute highly-parallelizable algorithms.<br />
<br />
At ETH, we are developing our own many-core system called MemPool [1]. It boasts 256 lightweight 32-bit Snitch cores developed at ETH [2]. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA [3]. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a maximum latency of only five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads and easy to program.<br />
<br />
For development purposes, we recently implemented a small version of MemPool on an FPGA. Specifically, we have a version of MemPool with 16 Snitch cores working on a Zynq® UltraScale+™ MPSoC. This chip features an ARM host core, which we use as a host to our many-core accelerator system.<br />
<br />
= Project Description =<br />
<br />
While this prototype is working well, there are a few features we would like to add or improve. The goal of a MemPool based FPGA project would be to work on one or multiple of the following aspects. <br />
<br />
* Our system currently uses 57% of the FPGA's resources with 16 Snitch cores. However, we want to get to 32. A potential project would be to optimize the RTL for the FPGA to make it more area efficient and squeeze 32 cores onto a single FPGA board.<br />
* A missing feature we have at the moment is debugging capability. For example, we lack the ability to print from the MemPool system. A thesis involving RTL and software design could implement this feature.<br />
* We want to use the FPGA prototype to benchmark applications. But currently, we can only measure the number of cycles and instructions when running benchmarks on MemPool. Adding hardware performance counters and host software to read this would be very beneficial.<br />
* We have a simple host software that communicates with MemPool and allows us to offload tasks to MemPool. However, there is currently very little communication, and most of it is done through polling. Adding a device driver for MemPool would make offloading tasks easier and could cut out much of the overhead.<br />
<br />
If a project along these directions interests you, or you have ideas of how you could extend MemPool on the FPGA, contact us, and we design a project that fits your background and interests.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Master Thesis: The student is required to write a weekly report at the end of each week and to send it to his advisors by email. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, plan the actions for the next week, and discuss open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
Semester Thesis: The student is advised, but not required, to write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, plan the actions for the next week, and bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 15/20 min presentation and 5 min Q&amp;A at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports<br />
<br />
= References =<br />
<br />
= References =<br />
<br />
<div id="refs" class="references csl-bib-body"><br />
<div id="ref-cavalcante2020mempool" class="csl-entry"><br />
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“MemPool: A shared-L1 memory many-core cluster with a low-latency interconnect.”</span> 2020.</span><br />
</div><br />
<div id="ref-zaruba2020snitch" class="csl-entry"><br />
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini, <span>“Snitch: A 10 <span class="nocase">kGE</span> pseudo dual-issue processor for area and energy efficient execution of floating-point intensive workloads.”</span> 2020.</span><br />
</div><br />
<div id="ref-RISCV" class="csl-entry"><br />
<span class="csl-left-margin">&#91;3&#93; </span><span class="csl-right-inline">A. Waterman ''et al.'', <span>“The <span>RISC-V</span> instruction set manual.”</span> 2014.</span><br />
</div><br />
<div id="ref-Sato2018" class="csl-entry"><br />
</div></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_Ara,_PULP%27s_Vector_Machine_(1-2S)&diff=6489Physical Implementation of Ara, PULP's Vector Machine (1-2S)2021-03-17T11:52:27Z<p>Matheusd: </p>
<hr />
<div>= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Jiantao Liu<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB), related to the memory traffic required for fetching the instructions.<br />
In particular, multi-core designs, although highly flexible, do not explore the regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest in vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general-purpose processor for applications that fit its execution model (e.g., machine learning, and digital signal processing).<br />
Vector machines tackle the VNB through vector instructions, which encode a series of micro-operations within a single instruction.<br />
The renewed interest in vector processing is reflected by the introduction of vector instruction extensions in all popular Instruction Set Architectures, such as Arm's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on version 0.10 of the RISC-V Vector extension.<br />
The vector unit was designed for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating-point operands.<br />
<br />
= Project Description =<br />
<br />
Ara is working well, with a prototype achieving an operating frequency of 1 GHz in a modern technology.<br />
Ara, however, was never taped-out. <br />
The objective of this project is to do so, first by building a simpler version of Ara, then by placing and routing the design.<br />
<br />
The project consists of the following parts:<br />
<br />
1. Familiarize with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Synthesize Ara (~2 person weeks)<br />
<br />
- Determine the timing constraints of the design, and which functionalities should be supported.<br />
- Determine the area constraints of the design.<br />
- Simulate the synthesized design.<br />
<br />
3. Make Ara a standalone design. (~3 person weeks)<br />
<br />
- Integrate a boot ROM and an FLL.<br />
- Integrate a JTAG port and the initialization mechanism.<br />
<br />
4. Place and route Ara (~2 person weeks)<br />
<br />
5. Run sanity tests on the place-and-routed design. (~2 person weeks)<br />
<br />
- Run DRC tests.<br />
- Simulate the place-and-routed design.<br />
- Reason about the testing mechanism.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
= References =<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Ara]]<br />
[[Category:ASIC]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Matheusd]]<br />
[[Category:Mperotti]]<br />
[[Category:Aottaviano]]<br />
[[Category:In_progress]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_Ara,_PULP%27s_Vector_Machine_(1-2S)&diff=6415Physical Implementation of Ara, PULP's Vector Machine (1-2S)2021-02-15T11:46:33Z<p>Matheusd: /* Project Description */</p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB), related to the memory traffic required for fetching the instructions.<br />
In particular, multi-core designs, although highly flexible, do not explore the regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest in vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general-purpose processor for applications that fit its execution model (e.g., machine learning, and digital signal processing).<br />
Vector machines tackle the VNB through vector instructions, which encode a series of micro-operations within a single instruction.<br />
The renewed interest in vector processing is reflected by the introduction of vector instruction extensions in all popular Instruction Set Architectures, such as Arm's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on version 0.10 of the RISC-V Vector extension.<br />
The vector unit was designed for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating-point operands.<br />
<br />
= Project Description =<br />
<br />
Ara is working well, with a prototype achieving an operating frequency of 1 GHz in a modern technology.<br />
Ara, however, was never taped-out. <br />
The objective of this project is to do so, first by building a simpler version of Ara, then by placing and routing the design.<br />
<br />
The project consists of the following parts:<br />
<br />
1. Familiarize with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Synthesize Ara (~2 person weeks)<br />
<br />
- Determine the timing constraints of the design, and which functionalities should be supported.<br />
- Determine the area constraints of the design.<br />
- Simulate the synthesized design.<br />
<br />
3. Make Ara a standalone design. (~3 person weeks)<br />
<br />
- Integrate a boot ROM and an FLL.<br />
- Integrate a JTAG port and the initialization mechanism.<br />
<br />
4. Place and route Ara (~2 person weeks)<br />
<br />
5. Run sanity tests on the place-and-routed design. (~2 person weeks)<br />
<br />
- Run DRC tests.<br />
- Simulate the place-and-routed design.<br />
- Reason about the testing mechanism.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
= References =<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Ara]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Matheusd]]<br />
[[Category:Mperotti]]<br />
[[Category:Aottaviano]]<br />
[[Category:Available]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_Ara,_PULP%27s_Vector_Machine_(1-2S)&diff=6414Physical Implementation of Ara, PULP's Vector Machine (1-2S)2021-02-15T11:45:55Z<p>Matheusd: </p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB), related to the memory traffic required for fetching the instructions.<br />
In particular, multi-core designs, although highly flexible, do not explore the regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest in vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general-purpose processor for applications that fit its execution model (e.g., machine learning, and digital signal processing).<br />
Vector machines tackle the VNB through vector instructions, which encode a series of micro-operations within a single instruction.<br />
The renewed interest in vector processing is reflected by the introduction of vector instruction extensions in all popular Instruction Set Architectures, such as Arm's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on version 0.10 of the RISC-V Vector extension.<br />
The vector unit was designed for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating-point operands.<br />
<br />
= Project Description =<br />
<br />
Ara is working well, with a prototype achieving an operating frequency of 1 GHz in a modern technology.<br />
Ara, however, was never taped-out. <br />
The objective of this project is to do so, first by building a simpler version of Ara, then by placing and routing the design.<br />
<br />
The project consists of the following parts:<br />
<br />
1. Familiarize with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Synthesize Ara (~2 person weeks)<br />
<br />
- Determine the timing constraints of the design, and which functionalities should be supported.<br />
- Determine the area constraints of the design.<br />
- Simulate the synthesized design.<br />
<br />
3. Make Ara a standalone design. (~3 person weeks)<br />
- Integrate a boot ROM and an FLL.<br />
- Integrate a JTAG port and the initialization mechanism.<br />
<br />
4. Place and route Ara (~2 person weeks)<br />
<br />
5. Run sanity tests on the place-and-routed design. (~2 person weeks)<br />
- Run DRC tests.<br />
- Simulate the place-and-routed design.<br />
- Reason about the testing mechanism.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
= References =<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Ara]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Matheusd]]<br />
[[Category:Mperotti]]<br />
[[Category:Aottaviano]]<br />
[[Category:Available]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_Ara,_PULP%27s_Vector_Machine_(1-2S)&diff=6358Physical Implementation of Ara, PULP's Vector Machine (1-2S)2021-02-01T11:47:50Z<p>Matheusd: </p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB), related to the memory traffic required for fetching the instructions.<br />
In particular, multi-core designs, although highly flexible, do not explore the regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest in vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general-purpose processor for applications that fit its execution model (e.g., machine learning, and digital signal processing).<br />
Vector machines tackle the VNB through vector instructions, which encode a series of micro-operations within a single instruction.<br />
The renewed interest in vector processing is reflected by the instruction of vector instruction extensions in all popular Instruction Set Architectures, such as ARM's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on version 0.10 of the RISC-V Vector extension.<br />
The vector unit was designed for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating-point operands.<br />
<br />
= Project Description =<br />
<br />
Ara is working well, with an old prototype achieving an operating frequency of 1 GHz in a modern technology.<br />
Ara, however, was never taped-out. <br />
The objective of this project is to do so, first by building a simpler version of Ara, then<br />
<br />
The project consists of the following parts:<br />
<br />
1. Familiarize with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Synthesize Ara (~2 person weeks)<br />
<br />
- Determine the timing constraints of the design, and which functionalities should be supported.<br />
- Determine the area constraints of the design.<br />
- Simulate the synthesized design.<br />
<br />
3. Make Ara a standalone design. (~3 person weeks)<br />
- Integrate a boot ROM and an FLL.<br />
- Integrate a JTAG port and the initialization mechanism.<br />
<br />
4. Place and route Ara (~2 person weeks)<br />
<br />
5. Run sanity tests on the place-and-routed design. (~2 person weeks)<br />
- Run DRC tests.<br />
- Simulate the place-and-routed design.<br />
- Reason about the testing mechanism.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
= References =<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Ara]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Matheusd]]<br />
[[Category:Mperotti]]<br />
[[Category:Aottaviano]]<br />
[[Category:Available]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_Ara,_PULP%27s_Vector_Machine_(1-2S)&diff=6357Physical Implementation of Ara, PULP's Vector Machine (1-2S)2021-02-01T11:40:59Z<p>Matheusd: </p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB), related to the memory traffic required for fetching the instructions.<br />
In particular, multi-core designs, although highly flexible, do not explore the regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest in vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general-purpose processor for applications that fit its execution model (e.g., machine learning, and digital signal processing).<br />
Vector machines tackle the VNB through vector instructions, which encode a series of micro-operations within a single instruction.<br />
The renewed interest in vector processing is reflected by the instruction of vector instruction extensions in all popular Instruction Set Architectures, such as ARM's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on version 0.10 of the RISC-V Vector extension.<br />
The vector unit was designed for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating-point operands.<br />
<br />
= Project Description =<br />
<br />
Ara is working well, with an old prototype achieving an operating frequency of 1 GHz in a modern technology.<br />
Ara, however, was never taped-out. <br />
The objective of this project is to do so, first by building a simpler version of Ara, then<br />
<br />
The project consists of the following parts:<br />
<br />
1. Familiarize with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Synthesize Ara (~2 person weeks)<br />
<br />
- Determine the timing constraints of the design, and which functionalities should be supported.<br />
- Determine the area constraints of the design.<br />
- Simulate the synthesized design.<br />
<br />
3. Make Ara a standalone design. (~3 person weeks)<br />
- Integrate a boot ROM and an FLL.<br />
- Integrate a JTAG port and the initialization mechanism.<br />
<br />
4. Place and route Ara (~2 person weeks)<br />
<br />
5. Run sanity tests on the place-and-routed design. (~2 person weeks)<br />
- Run DRC tests.<br />
- Simulate the place-and-routed design.<br />
- Reason about the testing mechanism.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
= References =<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:ASIC]]<br />
[[Category:High_Performance_SoCs]]<br />
[[Category:Available]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Mperotti]]<br />
[[Category:Aottaviano]]<br />
[[Category:2021]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_Ara,_PULP%27s_Vector_Machine_(1-2S)&diff=6356Physical Implementation of Ara, PULP's Vector Machine (1-2S)2021-02-01T10:30:31Z<p>Matheusd: </p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB), related to the memory traffic required for fetching the instructions.<br />
In particular, multi-core designs, although highly flexible, do not explore the regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest in vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general-purpose processor for applications that fit its execution model (e.g., machine learning, and digital signal processing).<br />
Vector machines tackle the VNB through vector instructions, which encode a series of micro-operations within a single instruction.<br />
The renewed interest in vector processing is reflected by the instruction of vector instruction extensions in all popular Instruction Set Architectures, such as ARM's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on version 0.10 of the RISC-V Vector extension.<br />
The vector unit was designed for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating-point operands.<br />
<br />
= Project Description =<br />
<br />
Ara is working well, with an old prototype achieving an operating frequency of 1 GHz in a modern technology.<br />
Ara, however, was never taped-out. <br />
The objective of this project is to do so, first by building a simpler version of Ara, then<br />
<br />
The project consists of the following parts:<br />
<br />
1. Familiarize with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Synthesize Ara (~2 person weeks)<br />
<br />
- Determine the timing constraints of the design, and which functionalities should be supported.<br />
- Determine the area constraints of the design.<br />
- Simulate the synthesized design.<br />
<br />
3. Make Ara a standalone design. (~3 person weeks)<br />
- Integrate a boot ROM and an FLL.<br />
- Integrate a JTAG port and the initialization mechanism.<br />
<br />
4. Place and route Ara (~2 person weeks)<br />
<br />
5. Run sanity tests on the place-and-routed design. (~2 person weeks)<br />
- Run DRC tests.<br />
- Simulate the place-and-routed design.<br />
- Reason about the testing mechanism.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
= References =<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:ASIC]]<br />
[[Category:High_Performance_SoCs]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Mperotti]]<br />
[[Category:Aottaviano]]<br />
[[Category:2021]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_Ara,_PULP%27s_Vector_Machine_(1-2S)&diff=6355Physical Implementation of Ara, PULP's Vector Machine (1-2S)2021-02-01T10:29:25Z<p>Matheusd: </p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB), related to the memory traffic required for fetching the instructions.<br />
In particular, multi-core designs, although highly flexible, do not explore the regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest in vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general-purpose processor for applications that fit its execution model (e.g., machine learning, and digital signal processing).<br />
Vector machines tackle the VNB through vector instructions, which encode a series of micro-operations within a single instruction.<br />
The renewed interest in vector processing is reflected by the instruction of vector instruction extensions in all popular Instruction Set Architectures, such as ARM's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on version 0.10 of the RISC-V Vector extension.<br />
The vector unit was designed for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating-point operands.<br />
<br />
= Project Description =<br />
<br />
Ara is working well, with an old prototype achieving an operating frequency of 1 GHz in a modern technology.<br />
Ara, however, was never taped-out. <br />
The objective of this project is to do so, first by building a simpler version of Ara, then<br />
<br />
The project consists of the following parts:<br />
<br />
1. Familiarize with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Synthesize Ara (~2 person weeks)<br />
1. Determine the timing constraints of the design, and which functionalities should be supported.<br />
2. Determine the area constraints of the design.<br />
3. Simulate the synthesized design.<br />
<br />
3. Make Ara a standalone design. (~3 person weeks)<br />
1. Integrate a boot ROM and an FLL.<br />
2. Integrate a JTAG port and the initialization mechanism.<br />
<br />
4. Place and route Ara (~2 person weeks)<br />
<br />
5. Run sanity tests on the place-and-routed design. (~2 person weeks)<br />
1. Run DRC tests.<br />
2. Simulate the place-and-routed design.<br />
3. Reason about the testing mechanism.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
= References =<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:ASIC]]<br />
[[Category:High_Performance_SoCs]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Mperotti]]<br />
[[Category:Aottaviano]]<br />
[[Category:2021]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_Ara,_PULP%27s_Vector_Machine_(1-2S)&diff=6354Physical Implementation of Ara, PULP's Vector Machine (1-2S)2021-02-01T10:28:54Z<p>Matheusd: </p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB), related to the memory traffic required for fetching the instructions.<br />
In particular, multi-core designs, although highly flexible, do not explore the regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest in vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general-purpose processor for applications that fit its execution model (e.g., machine learning, and digital signal processing).<br />
Vector machines tackle the VNB through vector instructions, which encode a series of micro-operations within a single instruction.<br />
The renewed interest in vector processing is reflected by the instruction of vector instruction extensions in all popular Instruction Set Architectures, such as ARM's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on version 0.10 of the RISC-V Vector extension.<br />
The vector unit was designed for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating-point operands.<br />
<br />
= Project Description =<br />
<br />
Ara is working well, with an old prototype achieving an operating frequency of 1 GHz in a modern technology.<br />
Ara, however, was never taped-out. <br />
The objective of this project is to do so, first by building a simpler version of Ara, then<br />
<br />
The project consists of the following parts:<br />
<br />
1. Familiarize with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Synthesize Ara (~2 person weeks)<br />
1. Determine the timing constraints of the design, and which functionalities should be supported.<br />
2. Determine the area constraints of the design.<br />
3. Simulate the synthesized design.<br />
<br />
3. Make Ara a standalone design. (~3 person weeks)<br />
1. Integrate a boot ROM and an FLL.<br />
2. Integrate a JTAG port and the initialization mechanism.<br />
<br />
4. Place and route Ara (~2 person weeks)<br />
<br />
5. Run sanity tests on the place-and-routed design. (~2 person weeks)<br />
1. Run DRC tests.<br />
2. Simulate the place-and-routed design.<br />
3. Reason about the testing mechanism.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
= References =<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_Ara,_PULP%27s_Vector_Machine_(1-2S)&diff=6353Physical Implementation of Ara, PULP's Vector Machine (1-2S)2021-02-01T08:53:05Z<p>Matheusd: </p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB), related to the memory traffic required for fetching the instructions.<br />
In particular, multi-core designs, although highly flexible, do not explore the regularity of regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest on vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general purpose processor for applications that fit its execution model (e.g., machine learning, and digital signal processing).<br />
Vector machines tackle the VNB through vector instructions, which encode a series of micro-operations within a single instruction.<br />
The renewed interest in vector processing is reflected by the instruction of vector instruction extensions in all popular Instruction Set Architectures, such as ARM's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on version 0.10 of the RISC-V Vector extension.<br />
The vector unit was designed for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating-point operands.<br />
<br />
= Project Description =<br />
<br />
Ara is working well, with an old prototype achieving an operating frequency of 1 GHz in a modern technology.<br />
Ara, however, was never taped-out. <br />
The objective of this project is to do so, first by building a simpler version of Ara, then<br />
<br />
The project consists of the following parts:<br />
<br />
1. Familiarizing with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Synthesize Ara, based on the flow written<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.<br />
<br />
= References =<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_Ara,_PULP%27s_Vector_Machine_(1-2S)&diff=6348Physical Implementation of Ara, PULP's Vector Machine (1-2S)2021-01-31T15:38:08Z<p>Matheusd: Created page with "= Overview = == Status: Available == * Type: Semester Thesis * Professor: Prof. Dr. L. Benini * Supervisors: ** Matheus Cavalcante: [mailto:matheusd@iis..."</p>
<hr />
<div>= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Matheusd | Matheus Cavalcante]]: [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
** [[:User:Mperotti | Matteo Perotti]]: [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
** [[:User:Aottaviano | Alessandro Ottaviano]]: [mailto:aottaviano@iis.ee.ethz.ch aottaviano@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB).<br />
This is related with the memory traffic required to the instruction fetch.<br />
Multi-core designs, although highly flexible, do not explore the regularity of regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest on vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general purpose processor for applications that fit its execution model.<br />
The renewed interest in vector processing is reflected by the instruction of vector instruction extensions in all popular Instruction Set Architectures, such as ARM's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on the version 0.10 of the RISC-V Vector extension.<br />
The vector unit was design for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating point operands.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== HDL Guidelines ==<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation (15min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
= Required skills =<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed/following the VLSI 1 course is recommended.<br />
* to have worked with back-end tools. Having followed/following the VLSI 2 course is recommended.</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_MemPool,_PULP%27s_Manycore_System_(1M/1-2S)&diff=6308Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)2021-01-25T13:54:37Z<p>Matheusd: </p>
<hr />
<div>== Introduction == <br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems.<br />
Those systems integrate a very large number of small cores (hundreds, thousands), that work independently to execute highly-parallelizable algorithms.<br />
<br />
A compromise is needed to push the core-count of such types of systems to its limits.<br />
A solution can be to prevent memory sharing between cores, that access private memory banks---this, however, impacts the programmability of the system.<br />
Memory sharing can be achieved through a cache hierarchy, which impacts the energy efficiency of the system through its non-negligible power consumption.<br />
<br />
At IIS, PULP's many-core system is MemPool, which integrates 256 Snitch cores and 1MiB of shared-L1 memory.<br />
Our example system, in GlobalFoundries 22nm 22FDX technology, runs at 500MHz, and the L1 memory can be accessed by any of the cores through a high-throughput interconnection network with a round-trip latency of at most five cycles.<br />
This is a challenging physical implementation, that is limited by routing congestion and wire propagation delay, instead of more common gate delay. <br />
Because of this, placing-and-routing MemPool was a unique experience, which required diving deep into the technology properties and the implementation flow.<br />
<br />
== Project description ==<br />
<br />
=== Project goals ===<br />
<br />
The goal of this project is to tape-out a smaller version of MemPool-affectionately called '''MinPool'''-on a larger technology node, as a proof of concept.<br />
While a 256-core is unfeasible at the technology node we are aiming for, we are curious to know how far the design can be pushed at an older technology.<br />
Ultimately, the design resulting from this project shall be a basis for the work of other people.<br />
Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.<br />
<br />
The whole project can be subdivided into four main phases, which are described in detail in the following.<br />
<br />
==== Part I: Familiarizing with the design ====<br />
<br />
MemPool has been the work of the two advisors for this thesis for more than a year and has already led to three Semester/Master projects.<br />
As such, there is some familiarizing with the architecture and the codebase that needs to be done at the start of this project.<br />
<br />
==== Part II: Implementing MemPool's tile ====<br />
<br />
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle.<br />
This module is the basis for building the MemPool cluster.<br />
It is, however, a "normal" module, which can be placed-and-routed as a rather dense macro.<br />
We need to determine how fast MemPool's tiles can be clocked at the target technology node, also considering that they should be rather dense.<br />
The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile.<br />
This should also give the student a feel for the next step of the project. <br />
<br />
==== Part III: Implementing MinPool ====<br />
<br />
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster.<br />
We do not know how many of such tiles we can instantiate before degrading timing or latency too much.<br />
In MemPool, for example, we managed to implement 64 tiles, each with four cores and 16KiB of L1 memory.<br />
After this design exploration phase, the student and the advisors will decide a final set of parameters for MinPool, finally deciding how large should it be. <br />
Our goal is to have the system with the largest core-count of the PULP systems so far.<br />
<br />
==== Part IV: Making MinPool standalone ====<br />
<br />
During this phase, some RTL is needed to make MinPool a stand-alone design.<br />
The student will need to integrate an FLL, a boot ROM, and a JTAG to access and initialize the system.<br />
While there are many IPs and know-how at IIS for that, this is also highly dependant on the final design parameters for MinPool. <br />
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks.<br />
<br />
== Milestones ==<br />
<br />
Throughout the project, a number of milestones have to be reached.<br />
In the following, a tentative list of expected milestones is provided, which might be modified during the first weeks of the work:<br />
<br />
1. Install MemPool and test it on different configurations<br />
<br />
2. Synthesize the tile <br />
<br />
3. Place-and-route the tile as a macro<br />
<br />
4. Synthesize the cluster, using the tile's black-box module<br />
<br />
5. Place-and-route MinPool<br />
<br />
6. Finalize the implementation flow.<br />
<br />
== Project Realization == <br />
<br />
=== Meetings ===<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students' and the assistants' schedules.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
=== Weekly Reports ===<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
=== HDL Guidelines ===<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
=== Report ===<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
=== Presentation ===<br />
<br />
There will be a presentation (20min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
* Looking for one or two semester projects<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:ASIC]]<br />
[[Category:High_Performance_SoCs]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_MemPool,_PULP%27s_Manycore_System_(1M/1-2S)&diff=6257Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)2021-01-20T09:51:15Z<p>Matheusd: </p>
<hr />
<div>== Introduction == <br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems.<br />
Those systems integrate a very large number of small cores (hundreds, thousands), that work independently to execute highly-parallelizable algorithms.<br />
<br />
A compromise is needed to push the core-count of such types of systems to its limits.<br />
A solution can be to prevent memory sharing between cores, that access private memory banks---this, however, impacts the programmability of the system.<br />
Memory sharing can be achieved through a cache hierarchy, which impacts the energy efficiency of the system through its non-negligible power consumption.<br />
<br />
At IIS, PULP's many-core system is MemPool, which integrates 256 Snitch cores and 1MiB of shared-L1 memory.<br />
Our example system, in GlobalFoundries 22nm 22FDX technology, runs at 500MHz, and the L1 memory can be accessed by any of the cores through a high-throughput interconnection network with a round-trip latency of at most five cycles.<br />
This is a challenging physical implementation, that is limited by routing congestion and wire propagation delay, instead of more common gate delay. <br />
Because of this, placing-and-routing MemPool was a unique experience, which required diving deep into the technology properties and on the implementation flow.<br />
<br />
== Project description ==<br />
<br />
=== Project goals ===<br />
<br />
The goal of this project is to tape-out a smaller version of MemPool-affectionately called '''MinPool'''-on a larger technology node, as a proof of concept.<br />
While a 256-core would probably be unfeasible at the technology node we are aiming, we are curious to know how far the design can be pushed at an older technology.<br />
Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people.<br />
Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.<br />
<br />
The whole project can be subdivided into four main phases, which are described in detail in the following.<br />
<br />
==== Part I: Familiarizing with the design ====<br />
<br />
MemPool has been the work of the two advisors for this thesis for more than a year and has already led to three Semester/Master projects.<br />
As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.<br />
<br />
==== Part II: Implementing MemPool's tile ====<br />
<br />
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle.<br />
This module is the basis for building the MemPool cluster.<br />
It is, however, a "normal" module, which can be placed-and-routed as a rather dense macro.<br />
We need to determine how fast MemPool's tiles can be clock at the target technology node, also considering that they should be rather dense.<br />
The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile.<br />
This should also give the student a feel to the next step of the project. <br />
<br />
==== Part III: Implementing MinPool ====<br />
<br />
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster.<br />
We do not know how many of such tiles we can instantiate before degrading timing or latency too much.<br />
In MemPool, for example, we managed to implement 64 tiles, each with four cores and 16KiB of L1 memory.<br />
After this design exploration phase, the student and the advisors will decide a final set of parameters for MinPool, finally deciding how large should it be. <br />
Our goal is to have the system with the largest core-cound of the PULP systems so far.<br />
<br />
==== Part IV: Making MinPool standalone ====<br />
<br />
During this phase, some RTL is needed to make MinPool a stand-alone design.<br />
The student will need to integrate a FLL, a boot ROM, and a JTAG to access and initialize the system.<br />
While there are many IPs and know-how at IIS for that, this is also highly-dependant on the final design parameters for MinPool. <br />
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks.<br />
<br />
== Milestones ==<br />
<br />
Throughout the project, a number of milestones have to be reached.<br />
In the following, a tentative list of expected milestones is provided, which might be modified during the first weeks of the work:<br />
<br />
1. Install MemPool and test it on different configurations<br />
<br />
2. Synthesize the tile <br />
<br />
3. Place-and-route the tile as a macro<br />
<br />
4. Synthesize the cluster, using the tile's black-box module<br />
<br />
5. Place-and-route MinPool<br />
<br />
6. Finalize the implementation flow.<br />
<br />
== Project Realization == <br />
<br />
=== Meetings ===<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students and the assistants' schedule.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
=== Weekly Reports ===<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, to plan the actions for the next week, and to discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
=== HDL Guidelines ===<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately ''feel at home'' with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
=== Report ===<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
=== Presentation ===<br />
<br />
There will be a presentation (20min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
* Looking for one or two semester projects<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:High_Performance_SoCs]]<br />
[[Category:ASIC]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Matheus_Cavalcante&diff=6189Matheus Cavalcante2020-12-08T16:33:38Z<p>Matheusd: Created page with "== Matheus Cavalcante == 200px| * '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch] Category:Digital I receiv..."</p>
<hr />
<div>== Matheus Cavalcante ==<br />
<br />
[[File:Matheusd_face_1to1.png|thumb|200px|]]<br />
<br />
* '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
[[Category:Digital]]<br />
<br />
I received my M.Sc. in integrated electronic systems from the Grenoble INP (Phelma) in 2018. I am currently pursuing a Ph.D. degree under the Digital Circuits and Systems group of Prof. Luca Benini. <br />
<br />
My current research interests include:<br />
<br />
* Computer and System Architecture<br />
* High-Performance Computing<br />
* Vector Processing<br />
* Interconnection Networks<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In progress<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
category = Completed<br />
category = Matheusd<br />
suppresserrors=true<br />
</DynamicPageList></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=User:Mmaxim&diff=5726User:Mmaxim2020-11-02T18:11:05Z<p>Matheusd: Matheusd moved page Mmaxim to User:Mmaxim</p>
<hr />
<div>==Maxim Mattheeuws==<br />
* '''Office''': ETZ J71.2<br />
* '''E-Mail''': [mailto:mmaxim@iis.ee.ethz.ch mmaxim@iis.ee.ethz.ch]<br />
* '''Phone''': (+41 44 63) 244 91<br />
* '''Homepage''': [https://iis.ee.ethz.ch/people/person-detail.MTg2NDE2.TGlzdC8zOTg3LDk5MDE4ODk4MA==.html People at IIS]<br />
[[Category:Supervisors]]<br />
[[Category:Digital]]<br />
<br />
==Interests==<br />
* Predictable Execution<br />
* Computer Architecture<br />
* Compiler Design<br />
* Signal Processing<br />
* Artificial Intelligence<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Mmaxim<br />
</DynamicPageList><br />
<br />
== Projects in Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In progress<br />
category = Mmaxim<br />
</DynamicPageList><br />
<br />
== Completed Projects ==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Mmaxim<br />
</DynamicPageList></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Mmaxim&diff=5727Mmaxim2020-11-02T18:11:05Z<p>Matheusd: Matheusd moved page Mmaxim to User:Mmaxim</p>
<hr />
<div>#REDIRECT [[User:Mmaxim]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_MemPool,_PULP%27s_Manycore_System_(1M/1-2S)&diff=5673Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)2020-11-02T15:40:58Z<p>Matheusd: Matheusd moved page Physical Implementation of MemPool, PULP's Manycore System to Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)</p>
<hr />
<div>== Introduction == <br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems.<br />
Those systems integrate a very large number of small cores (hundreds, thousands), that work independently to execute highly-parallelizable algorithms.<br />
<br />
A compromise is needed to push the core-count of such types of systems to its limits.<br />
A solution can be to prevent memory sharing between cores, that access private memory banks---this, however, impacts the programmability of the system.<br />
Memory sharing can be achieved through a cache hierarchy, which impacts the energy efficiency of the system through its non-negligible power consumption.<br />
<br />
At IIS, PULP's many-core system is MemPool, which integrates 256 Snitch cores and 1MiB of shared-L1 memory.<br />
Our example system, in GlobalFoundries 22nm 22FDX technology, runs at 500MHz, and the L1 memory can be accessed by any of the cores through a high-throughput interconnection network with a round-trip latency of at most five cycles.<br />
This is a challenging physical implementation, that is limited by routing congestion and wire propagation delay, instead of more common gate delay. <br />
Because of this, placing-and-routing MemPool was a unique experience, which required diving deep into the technology properties and on the implementation flow.<br />
<br />
== Project description ==<br />
<br />
=== Project goals ===<br />
<br />
The goal of this project is to tape-out a smaller version of MemPool-affectionately called '''MinPool'''-on a larger technology node, as a proof of concept.<br />
While a 256-core would probably be unfeasible at the technology node we are aiming, we are curious to know how far the design can be pushed at an older technology.<br />
Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people.<br />
Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.<br />
<br />
The whole project can be subdivided into four main phases, which are described in detail in the following.<br />
<br />
==== Part I: Familiarizing with the design ====<br />
<br />
MemPool has been the work of the two advisors for this thesis for more than a year and has already led to three Semester/Master projects.<br />
As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.<br />
<br />
==== Part II: Implementing MemPool's tile ====<br />
<br />
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle.<br />
This module is the basis for building the MemPool cluster.<br />
It is, however, a "normal" module, which can be placed-and-routed as a rather dense macro.<br />
We need to determine how fast MemPool's tiles can be clock at the target technology node, also considering that they should be rather dense.<br />
The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile.<br />
This should also give the student a feel to the next step of the project. <br />
<br />
==== Part III: Implementing MinPool ====<br />
<br />
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster.<br />
We do not know how many of such tiles we can instantiate before degrading timing or latency too much.<br />
In MemPool, for example, we managed to implement 64 tiles, each with four cores and 16KiB of L1 memory.<br />
After this design exploration phase, the student and the advisors will decide a final set of parameters for MinPool, finally deciding how large should it be. <br />
Our goal is to have the system with the largest core-cound of the PULP systems so far.<br />
<br />
==== Part IV: Making MinPool standalone ====<br />
<br />
During this phase, some RTL is needed to make MinPool a stand-alone design.<br />
The student will need to integrate a FLL, a boot ROM, and a JTAG to access and initialize the system.<br />
While there are many IPs and know-how at IIS for that, this is also highly-dependant on the final design parameters for MinPool. <br />
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks.<br />
<br />
== Milestones ==<br />
<br />
Throughout the project, a number of milestones have to be reached.<br />
In the following, a tentative list of expected milestones is provided, which might be modified during the first weeks of the work:<br />
<br />
1. Install MemPool and test it on different configurations<br />
<br />
2. Synthesize the tile <br />
<br />
3. Place-and-route the tile as a macro<br />
<br />
4. Synthesize the cluster, using the tile's black-box module<br />
<br />
5. Place-and-route MinPool<br />
<br />
6. Finalize the implementation flow.<br />
<br />
== Project Realization == <br />
<br />
=== Meetings ===<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students and the assistants' schedule.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
=== Weekly Reports ===<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, to plan the actions for the next week, and to discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
=== HDL Guidelines ===<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately \textit{feel at home} with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
=== Report ===<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
=== Presentation ===<br />
<br />
There will be a presentation (20min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
* Looking for one or two semester projects<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:High_Performance_SoCs]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_MemPool,_PULP%27s_Manycore_System&diff=5674Physical Implementation of MemPool, PULP's Manycore System2020-11-02T15:40:58Z<p>Matheusd: Matheusd moved page Physical Implementation of MemPool, PULP's Manycore System to Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)</p>
<hr />
<div>#REDIRECT [[Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_MemPool,_PULP%27s_Manycore_System_(1M/1-2S)&diff=5672Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)2020-11-02T15:40:11Z<p>Matheusd: /* Status: Available */</p>
<hr />
<div>== Introduction == <br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems.<br />
Those systems integrate a very large number of small cores (hundreds, thousands), that work independently to execute highly-parallelizable algorithms.<br />
<br />
A compromise is needed to push the core-count of such types of systems to its limits.<br />
A solution can be to prevent memory sharing between cores, that access private memory banks---this, however, impacts the programmability of the system.<br />
Memory sharing can be achieved through a cache hierarchy, which impacts the energy efficiency of the system through its non-negligible power consumption.<br />
<br />
At IIS, PULP's many-core system is MemPool, which integrates 256 Snitch cores and 1MiB of shared-L1 memory.<br />
Our example system, in GlobalFoundries 22nm 22FDX technology, runs at 500MHz, and the L1 memory can be accessed by any of the cores through a high-throughput interconnection network with a round-trip latency of at most five cycles.<br />
This is a challenging physical implementation, that is limited by routing congestion and wire propagation delay, instead of more common gate delay. <br />
Because of this, placing-and-routing MemPool was a unique experience, which required diving deep into the technology properties and on the implementation flow.<br />
<br />
== Project description ==<br />
<br />
=== Project goals ===<br />
<br />
The goal of this project is to tape-out a smaller version of MemPool-affectionately called '''MinPool'''-on a larger technology node, as a proof of concept.<br />
While a 256-core would probably be unfeasible at the technology node we are aiming, we are curious to know how far the design can be pushed at an older technology.<br />
Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people.<br />
Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.<br />
<br />
The whole project can be subdivided into four main phases, which are described in detail in the following.<br />
<br />
==== Part I: Familiarizing with the design ====<br />
<br />
MemPool has been the work of the two advisors for this thesis for more than a year and has already led to three Semester/Master projects.<br />
As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.<br />
<br />
==== Part II: Implementing MemPool's tile ====<br />
<br />
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle.<br />
This module is the basis for building the MemPool cluster.<br />
It is, however, a "normal" module, which can be placed-and-routed as a rather dense macro.<br />
We need to determine how fast MemPool's tiles can be clock at the target technology node, also considering that they should be rather dense.<br />
The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile.<br />
This should also give the student a feel to the next step of the project. <br />
<br />
==== Part III: Implementing MinPool ====<br />
<br />
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster.<br />
We do not know how many of such tiles we can instantiate before degrading timing or latency too much.<br />
In MemPool, for example, we managed to implement 64 tiles, each with four cores and 16KiB of L1 memory.<br />
After this design exploration phase, the student and the advisors will decide a final set of parameters for MinPool, finally deciding how large should it be. <br />
Our goal is to have the system with the largest core-cound of the PULP systems so far.<br />
<br />
==== Part IV: Making MinPool standalone ====<br />
<br />
During this phase, some RTL is needed to make MinPool a stand-alone design.<br />
The student will need to integrate a FLL, a boot ROM, and a JTAG to access and initialize the system.<br />
While there are many IPs and know-how at IIS for that, this is also highly-dependant on the final design parameters for MinPool. <br />
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks.<br />
<br />
== Milestones ==<br />
<br />
Throughout the project, a number of milestones have to be reached.<br />
In the following, a tentative list of expected milestones is provided, which might be modified during the first weeks of the work:<br />
<br />
1. Install MemPool and test it on different configurations<br />
<br />
2. Synthesize the tile <br />
<br />
3. Place-and-route the tile as a macro<br />
<br />
4. Synthesize the cluster, using the tile's black-box module<br />
<br />
5. Place-and-route MinPool<br />
<br />
6. Finalize the implementation flow.<br />
<br />
== Project Realization == <br />
<br />
=== Meetings ===<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students and the assistants' schedule.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
=== Weekly Reports ===<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, to plan the actions for the next week, and to discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
=== HDL Guidelines ===<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately \textit{feel at home} with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
=== Report ===<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
=== Presentation ===<br />
<br />
There will be a presentation (20min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
* Looking for one or two semester projects<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:High_Performance_SoCs]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_MemPool,_PULP%27s_Manycore_System_(1M/1-2S)&diff=5671Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)2020-11-02T15:39:31Z<p>Matheusd: /* Meetings = */</p>
<hr />
<div>== Introduction == <br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems.<br />
Those systems integrate a very large number of small cores (hundreds, thousands), that work independently to execute highly-parallelizable algorithms.<br />
<br />
A compromise is needed to push the core-count of such types of systems to its limits.<br />
A solution can be to prevent memory sharing between cores, that access private memory banks---this, however, impacts the programmability of the system.<br />
Memory sharing can be achieved through a cache hierarchy, which impacts the energy efficiency of the system through its non-negligible power consumption.<br />
<br />
At IIS, PULP's many-core system is MemPool, which integrates 256 Snitch cores and 1MiB of shared-L1 memory.<br />
Our example system, in GlobalFoundries 22nm 22FDX technology, runs at 500MHz, and the L1 memory can be accessed by any of the cores through a high-throughput interconnection network with a round-trip latency of at most five cycles.<br />
This is a challenging physical implementation, that is limited by routing congestion and wire propagation delay, instead of more common gate delay. <br />
Because of this, placing-and-routing MemPool was a unique experience, which required diving deep into the technology properties and on the implementation flow.<br />
<br />
== Project description ==<br />
<br />
=== Project goals ===<br />
<br />
The goal of this project is to tape-out a smaller version of MemPool-affectionately called '''MinPool'''-on a larger technology node, as a proof of concept.<br />
While a 256-core would probably be unfeasible at the technology node we are aiming, we are curious to know how far the design can be pushed at an older technology.<br />
Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people.<br />
Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.<br />
<br />
The whole project can be subdivided into four main phases, which are described in detail in the following.<br />
<br />
==== Part I: Familiarizing with the design ====<br />
<br />
MemPool has been the work of the two advisors for this thesis for more than a year and has already led to three Semester/Master projects.<br />
As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.<br />
<br />
==== Part II: Implementing MemPool's tile ====<br />
<br />
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle.<br />
This module is the basis for building the MemPool cluster.<br />
It is, however, a "normal" module, which can be placed-and-routed as a rather dense macro.<br />
We need to determine how fast MemPool's tiles can be clock at the target technology node, also considering that they should be rather dense.<br />
The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile.<br />
This should also give the student a feel to the next step of the project. <br />
<br />
==== Part III: Implementing MinPool ====<br />
<br />
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster.<br />
We do not know how many of such tiles we can instantiate before degrading timing or latency too much.<br />
In MemPool, for example, we managed to implement 64 tiles, each with four cores and 16KiB of L1 memory.<br />
After this design exploration phase, the student and the advisors will decide a final set of parameters for MinPool, finally deciding how large should it be. <br />
Our goal is to have the system with the largest core-cound of the PULP systems so far.<br />
<br />
==== Part IV: Making MinPool standalone ====<br />
<br />
During this phase, some RTL is needed to make MinPool a stand-alone design.<br />
The student will need to integrate a FLL, a boot ROM, and a JTAG to access and initialize the system.<br />
While there are many IPs and know-how at IIS for that, this is also highly-dependant on the final design parameters for MinPool. <br />
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks.<br />
<br />
== Milestones ==<br />
<br />
Throughout the project, a number of milestones have to be reached.<br />
In the following, a tentative list of expected milestones is provided, which might be modified during the first weeks of the work:<br />
<br />
1. Install MemPool and test it on different configurations<br />
<br />
2. Synthesize the tile <br />
<br />
3. Place-and-route the tile as a macro<br />
<br />
4. Synthesize the cluster, using the tile's black-box module<br />
<br />
5. Place-and-route MinPool<br />
<br />
6. Finalize the implementation flow.<br />
<br />
== Project Realization == <br />
<br />
=== Meetings ===<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students and the assistants' schedule.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
=== Weekly Reports ===<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, to plan the actions for the next week, and to discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
=== HDL Guidelines ===<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately \textit{feel at home} with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
=== Report ===<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
=== Presentation ===<br />
<br />
There will be a presentation (20min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:High_Performance_SoCs]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_MemPool,_PULP%27s_Manycore_System_(1M/1-2S)&diff=5670Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)2020-11-02T15:39:25Z<p>Matheusd: /* Project Realization */</p>
<hr />
<div>== Introduction == <br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems.<br />
Those systems integrate a very large number of small cores (hundreds, thousands), that work independently to execute highly-parallelizable algorithms.<br />
<br />
A compromise is needed to push the core-count of such types of systems to its limits.<br />
A solution can be to prevent memory sharing between cores, that access private memory banks---this, however, impacts the programmability of the system.<br />
Memory sharing can be achieved through a cache hierarchy, which impacts the energy efficiency of the system through its non-negligible power consumption.<br />
<br />
At IIS, PULP's many-core system is MemPool, which integrates 256 Snitch cores and 1MiB of shared-L1 memory.<br />
Our example system, in GlobalFoundries 22nm 22FDX technology, runs at 500MHz, and the L1 memory can be accessed by any of the cores through a high-throughput interconnection network with a round-trip latency of at most five cycles.<br />
This is a challenging physical implementation, that is limited by routing congestion and wire propagation delay, instead of more common gate delay. <br />
Because of this, placing-and-routing MemPool was a unique experience, which required diving deep into the technology properties and on the implementation flow.<br />
<br />
== Project description ==<br />
<br />
=== Project goals ===<br />
<br />
The goal of this project is to tape-out a smaller version of MemPool-affectionately called '''MinPool'''-on a larger technology node, as a proof of concept.<br />
While a 256-core would probably be unfeasible at the technology node we are aiming, we are curious to know how far the design can be pushed at an older technology.<br />
Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people.<br />
Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.<br />
<br />
The whole project can be subdivided into four main phases, which are described in detail in the following.<br />
<br />
==== Part I: Familiarizing with the design ====<br />
<br />
MemPool has been the work of the two advisors for this thesis for more than a year and has already led to three Semester/Master projects.<br />
As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.<br />
<br />
==== Part II: Implementing MemPool's tile ====<br />
<br />
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle.<br />
This module is the basis for building the MemPool cluster.<br />
It is, however, a "normal" module, which can be placed-and-routed as a rather dense macro.<br />
We need to determine how fast MemPool's tiles can be clock at the target technology node, also considering that they should be rather dense.<br />
The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile.<br />
This should also give the student a feel to the next step of the project. <br />
<br />
==== Part III: Implementing MinPool ====<br />
<br />
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster.<br />
We do not know how many of such tiles we can instantiate before degrading timing or latency too much.<br />
In MemPool, for example, we managed to implement 64 tiles, each with four cores and 16KiB of L1 memory.<br />
After this design exploration phase, the student and the advisors will decide a final set of parameters for MinPool, finally deciding how large should it be. <br />
Our goal is to have the system with the largest core-cound of the PULP systems so far.<br />
<br />
==== Part IV: Making MinPool standalone ====<br />
<br />
During this phase, some RTL is needed to make MinPool a stand-alone design.<br />
The student will need to integrate a FLL, a boot ROM, and a JTAG to access and initialize the system.<br />
While there are many IPs and know-how at IIS for that, this is also highly-dependant on the final design parameters for MinPool. <br />
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks.<br />
<br />
== Milestones ==<br />
<br />
Throughout the project, a number of milestones have to be reached.<br />
In the following, a tentative list of expected milestones is provided, which might be modified during the first weeks of the work:<br />
<br />
1. Install MemPool and test it on different configurations<br />
<br />
2. Synthesize the tile <br />
<br />
3. Place-and-route the tile as a macro<br />
<br />
4. Synthesize the cluster, using the tile's black-box module<br />
<br />
5. Place-and-route MinPool<br />
<br />
6. Finalize the implementation flow.<br />
<br />
== Project Realization == <br />
<br />
=== Meetings ====<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students and the assistants' schedule.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Besides these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
=== Weekly Reports ===<br />
<br />
The student is required to write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress, and any findings made during the week, to plan the actions for the next week, and to discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
=== HDL Guidelines ===<br />
<br />
Naming Conventions: Adopting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately \textit{feel at home} with each other's code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
=== Report ===<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless, the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
=== Presentation ===<br />
<br />
There will be a presentation (20min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:High_Performance_SoCs]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_MemPool,_PULP%27s_Manycore_System_(1M/1-2S)&diff=5669Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)2020-11-02T15:38:37Z<p>Matheusd: </p>
<hr />
<div>== Introduction == <br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems.<br />
Those systems integrate a very large number of small cores (hundreds, thousands), that work independently to execute highly-parallelizable algorithms.<br />
<br />
A compromise is needed to push the core-count of such types of systems to its limits.<br />
A solution can be to prevent memory sharing between cores, that access private memory banks---this, however, impacts the programmability of the system.<br />
Memory sharing can be achieved through a cache hierarchy, which impacts the energy efficiency of the system through its non-negligible power consumption.<br />
<br />
At IIS, PULP's many-core system is MemPool, which integrates 256 Snitch cores and 1MiB of shared-L1 memory.<br />
Our example system, in GlobalFoundries 22nm 22FDX technology, runs at 500MHz, and the L1 memory can be accessed by any of the cores through a high-throughput interconnection network with a round-trip latency of at most five cycles.<br />
This is a challenging physical implementation, that is limited by routing congestion and wire propagation delay, instead of more common gate delay. <br />
Because of this, placing-and-routing MemPool was a unique experience, which required diving deep into the technology properties and on the implementation flow.<br />
<br />
== Project description ==<br />
<br />
=== Project goals ===<br />
<br />
The goal of this project is to tape-out a smaller version of MemPool-affectionately called '''MinPool'''-on a larger technology node, as a proof of concept.<br />
While a 256-core would probably be unfeasible at the technology node we are aiming, we are curious to know how far the design can be pushed at an older technology.<br />
Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people.<br />
Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.<br />
<br />
The whole project can be subdivided into four main phases, which are described in detail in the following.<br />
<br />
==== Part I: Familiarizing with the design ====<br />
<br />
MemPool has been the work of the two advisors for this thesis for more than a year and has already led to three Semester/Master projects.<br />
As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.<br />
<br />
==== Part II: Implementing MemPool's tile ====<br />
<br />
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle.<br />
This module is the basis for building the MemPool cluster.<br />
It is, however, a "normal" module, which can be placed-and-routed as a rather dense macro.<br />
We need to determine how fast MemPool's tiles can be clock at the target technology node, also considering that they should be rather dense.<br />
The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile.<br />
This should also give the student a feel to the next step of the project. <br />
<br />
==== Part III: Implementing MinPool ====<br />
<br />
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster.<br />
We do not know how many of such tiles we can instantiate before degrading timing or latency too much.<br />
In MemPool, for example, we managed to implement 64 tiles, each with four cores and 16KiB of L1 memory.<br />
After this design exploration phase, the student and the advisors will decide a final set of parameters for MinPool, finally deciding how large should it be. <br />
Our goal is to have the system with the largest core-cound of the PULP systems so far.<br />
<br />
==== Part IV: Making MinPool standalone ====<br />
<br />
During this phase, some RTL is needed to make MinPool a stand-alone design.<br />
The student will need to integrate a FLL, a boot ROM, and a JTAG to access and initialize the system.<br />
While there are many IPs and know-how at IIS for that, this is also highly-dependant on the final design parameters for MinPool. <br />
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks.<br />
<br />
== Milestones ==<br />
<br />
Throughout the project, a number of milestones have to be reached.<br />
In the following, a tentative list of expected milestones is provided, which might be modified during the first weeks of the work:<br />
<br />
1. Install MemPool and test it on different configurations<br />
<br />
2. Synthesize the tile <br />
<br />
3. Place-and-route the tile as a macro<br />
<br />
4. Synthesize the cluster, using the tile's black-box module<br />
<br />
5. Place-and-route MinPool<br />
<br />
6. Finalize the implementation flow.<br />
<br />
== Project Realization == <br />
<br />
=== Meetings ====<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students and the assistants schedule.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Beside these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
=== Weekly Reports ===<br />
<br />
The student is required to a write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
=== HDL Guidelines ===<br />
<br />
Naming Conventions: Adapting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately \textit{feel at home} with each others code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
=== Report ===<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
=== Presentation ===<br />
<br />
There will be a presentation (20min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:High_Performance_SoCs]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Physical_Implementation_of_MemPool,_PULP%27s_Manycore_System_(1M/1-2S)&diff=5668Physical Implementation of MemPool, PULP's Manycore System (1M/1-2S)2020-11-02T15:38:18Z<p>Matheusd: Created page with "== Introduction == In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems. Those systems integrate a very la..."</p>
<hr />
<div>== Introduction == <br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of many-core systems.<br />
Those systems integrate a very large number of small cores (hundreds, thousands), that work independently to execute highly-parallelizable algorithms.<br />
<br />
A compromise is needed to push the core-count of such types of systems to its limits.<br />
A solution can be to prevent memory sharing between cores, that access private memory banks---this, however, impacts the programmability of the system.<br />
Memory sharing can be achieved through a cache hierarchy, which impacts the energy efficiency of the system through its non-negligible power consumption.<br />
<br />
At IIS, PULP's many-core system is _MemPool_, which integrates $256$ Snitch cores and 1MiB of shared-L1 memory.<br />
Our example system, in GlobalFoundries 22nm 22FDX technology, runs at 500MHz, and the L1 memory can be accessed by any of the cores through a high-throughput interconnection network with a round-trip latency of at most five cycles.<br />
This is a challenging physical implementation, that is limited by routing congestion and wire propagation delay, instead of more common gate delay. <br />
Because of this, placing-and-routing MemPool was a unique experience, which required diving deep into the technology properties and on the implementation flow.<br />
<br />
== Project description ==<br />
<br />
=== Project goals ===<br />
<br />
The goal of this project is to tape-out a smaller version of MemPool-affectionately called '''MinPool'''-on a larger technology node, as a proof of concept.<br />
While a 256-core would probably be unfeasible at the technology node we are aiming, we are curious to know how far the design can be pushed at an older technology.<br />
Ultimately, the design resulting from this project shall lead to a scientific publication, be used to generate results for scientific publications, and/or be a basis for the work of other people.<br />
Hence, the design must be properly tested, the generated source code appropriately commented, and the project well documented.<br />
<br />
The whole project can be subdivided into four main phases, which are described in detail in the following.<br />
<br />
==== Part I: Familiarizing with the design ====<br />
<br />
MemPool has been the work of the two advisors for this thesis for more than a year and has already led to three Semester/Master projects.<br />
As such, there is some familiarizing with the architecture and the code base that needs to be done at the start of this project.<br />
<br />
==== Part II: Implementing MemPool's tile ====<br />
<br />
MemPool's tile is the basis of its architecture, containing a few Snitch cores and a number of L1 memory accessible within one cycle.<br />
This module is the basis for building the MemPool cluster.<br />
It is, however, a "normal" module, which can be placed-and-routed as a rather dense macro.<br />
We need to determine how fast MemPool's tiles can be clock at the target technology node, also considering that they should be rather dense.<br />
The student can play with parameters such as the number of Snitch cores at each tile, the size of the instruction cache, and the size of the L1 memory per tile.<br />
This should also give the student a feel to the next step of the project. <br />
<br />
==== Part III: Implementing MinPool ====<br />
<br />
By implementing several MinPool tiles and interconnecting them cleverly, we can build the many-core cluster.<br />
We do not know how many of such tiles we can instantiate before degrading timing or latency too much.<br />
In MemPool, for example, we managed to implement 64 tiles, each with four cores and 16KiB of L1 memory.<br />
After this design exploration phase, the student and the advisors will decide a final set of parameters for MinPool, finally deciding how large should it be. <br />
Our goal is to have the system with the largest core-cound of the PULP systems so far.<br />
<br />
==== Part IV: Making MinPool standalone ====<br />
<br />
During this phase, some RTL is needed to make MinPool a stand-alone design.<br />
The student will need to integrate a FLL, a boot ROM, and a JTAG to access and initialize the system.<br />
While there are many IPs and know-how at IIS for that, this is also highly-dependant on the final design parameters for MinPool. <br />
At this phase, the student should also check whether his placed-and-routed design passes the sanity checks.<br />
<br />
== Milestones ==<br />
<br />
Throughout the project, a number of milestones have to be reached.<br />
In the following, a tentative list of expected milestones is provided, which might be modified during the first weeks of the work:<br />
<br />
1. Install MemPool and test it on different configurations<br />
<br />
2. Synthesize the tile <br />
<br />
3. Place-and-route the tile as a macro<br />
<br />
4. Synthesize the cluster, using the tile's black-box module<br />
<br />
5. Place-and-route MinPool<br />
<br />
6. Finalize the implementation flow.<br />
<br />
== Project Realization == <br />
<br />
=== Meetings ====<br />
<br />
Weekly meetings will be held between the student and the assistants.<br />
The exact time and location of these meetings will be determined within the first week of the project in order to fit the students and the assistants schedule.<br />
These meetings will be used to evaluate the status and progress of the project.<br />
Beside these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
=== Weekly Reports ===<br />
<br />
The student is required to a write a weekly report at the end of each week and to send it to his advisors by email.<br />
The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to discuss open questions and points.<br />
The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
=== HDL Guidelines ===<br />
<br />
Naming Conventions: Adapting a consistent naming scheme is one of the most important steps in order to make your code easy to understand.<br />
If signals, processes, and entities are always named the same way, any inconsistency can be detected easier.<br />
Moreover, if a design group shares the same naming convention, all members would immediately \textit{feel at home} with each others code.<br />
The naming conventions we follow in the PULP project are available [https://github.com/pulp-platform/style-guidelines here].<br />
<br />
=== Report ===<br />
<br />
Documentation is an important and often overlooked aspect of engineering.<br />
A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English.<br />
Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or Tgif, or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be downloaded [https://iis-people.ee.ethz.ch/~vlsi1/templates/report.tar.gz here].<br />
<br />
=== Presentation ===<br />
<br />
There will be a presentation (20min presentation and 5min Q&A) at the end of this project in order to present your results to a wider audience.<br />
The exact date will be determined towards the end of the work.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:High_Performance_SoCs]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Heterogeneous_System_for_Image_Processing_on_an_FPGA_(S)&diff=5656Implementation of a Heterogeneous System for Image Processing on an FPGA (S)2020-11-02T15:26:59Z<p>Matheusd: </p>
<hr />
<div>== Introduction ==<br />
<br />
Heterogeneous systems combine a general-purpose host processor with domain-specific Programmable Many-Core Accelerators (PMCAs).<br />
Such systems are highly versatile, due to their host processor capabilities, while having high performance and energy efficiency through their PMCAs.<br />
HERO is a FPGA-based research platform developed at IIS that combines a PMCA composed by RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor.<br />
<br />
Heterogeneous systems have a complex programming model, which lead to significant effort to develop tools to retain a high programmer productivity.<br />
Halide is domain specific programming language designed to write fast image processing algorithms.<br />
More specifically, it is a C++ dialect with a functional programming paradigm.<br />
It's aim is to separate the function applied to the image (pipeline), and the sequence in which the algorithm is executed (schedule).<br />
For example, the schedule encompasses how the algorithm is parallelized, if the image is tiled, processed in column or row major order, if solutions required by multiple threads are shared or recomputed, if parts of the computation is offloaded to an accelerator, and so on.<br />
This allows a programmer to write a functional description of the image processing algorithm and then explore ways of scheduling the execution with only a couple of lines of code, and without modifying the algorithm.<br />
Furthermore, the same algorithm can be run efficiently on multiple different architectures by only changing the schedule.<br />
To have Halide generate efficient code, the specific architecture requires to have an efficient Halide runtime implementation, and good compiler support, as Halide is tightly coupled with the compiler.<br />
<br />
== Project description ==<br />
[[File:HalideLang.png|thumb|600px]]<br />
<br />
The goal of this project is to bring up Halide on HERO, using Ariane, a 64-bit RV64GC core, as a host processor.<br />
Ariane would manage Halide's frontend, while the image processing tasks would execute on 32-bit cores in the cluster.<br />
The final goal of this thesis is to have Halide programmed image processing kernels running on an HERO system implemented on an FPGA.<br />
<br />
The project can be done by as one or two semester thesis. The project consists of three parts:<br />
<br />
1. Familiarizing with the Halide language and the architecture of HERO (~2 person weeks).<br />
<br />
2. Add a RISC-V target to Halide's frontend (~3 person weeks).<br />
<br />
3. Test up the Halide environment on an FPGA with a set of custom image processing kernels (~1 person week)<br />
<br />
4. Documentation and report writing (~1 person week)<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowlegde of the C++ programming language<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Completed ===<br />
<br />
* Student: Pierre-Hugues Blelly<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]], [[:User:Akurth | Andreas Kurth]]<br />
<br />
=== Professor ===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
== Meetings & Presentations ==<br />
<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details refer to [http://eda.ee.ethz.ch/index.php/Design_review (1)].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS Colloquium.<br />
<br />
== References ==<br />
<br />
# Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA. CARRV' 2017. [https://doi.org/10.3929/ethz-b-000219249 link]<br />
# Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, Frédo Durand. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. SIGGRAPH 2012. [http://people.csail.mit.edu/jrk/halide12 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Completed]]<br />
[[Category:Semester Thesis]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Andreasd]]<br />
[[Category:Heterogeneous_Acceleration_Systems]]<br />
[[Category:Computer Architecture]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Heterogeneous_System_for_Image_Processing_on_an_FPGA_(S)&diff=5653Implementation of a Heterogeneous System for Image Processing on an FPGA (S)2020-11-02T15:26:49Z<p>Matheusd: Matheusd moved page Implementation of a Heterogeneous System for Image Processing on an FPGA (M) to Implementation of a Heterogeneous System for Image Processing on an FPGA (S)</p>
<hr />
<div>== Introduction ==<br />
<br />
Heterogeneous systems combine a general-purpose host processor with domain-specific Programmable Many-Core Accelerators (PMCAs).<br />
Such systems are highly versatile, due to their host processor capabilities, while having high performance and energy efficiency through their PMCAs.<br />
HERO is a FPGA-based research platform developed at IIS that combines a PMCA composed by RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor.<br />
<br />
Heterogeneous systems have a complex programming model, which lead to significant effort to develop tools to retain a high programmer productivity.<br />
Halide is domain specific programming language designed to write fast image processing algorithms.<br />
More specifically, it is a C++ dialect with a functional programming paradigm.<br />
It's aim is to separate the function applied to the image (pipeline), and the sequence in which the algorithm is executed (schedule).<br />
For example, the schedule encompasses how the algorithm is parallelized, if the image is tiled, processed in column or row major order, if solutions required by multiple threads are shared or recomputed, if parts of the computation is offloaded to an accelerator, and so on.<br />
This allows a programmer to write a functional description of the image processing algorithm and then explore ways of scheduling the execution with only a couple of lines of code, and without modifying the algorithm.<br />
Furthermore, the same algorithm can be run efficiently on multiple different architectures by only changing the schedule.<br />
To have Halide generate efficient code, the specific architecture requires to have an efficient Halide runtime implementation, and good compiler support, as Halide is tightly coupled with the compiler.<br />
<br />
== Project description ==<br />
[[File:HalideLang.png|thumb|600px]]<br />
<br />
The goal of this project is to bring up Halide on HERO, using Ariane, a 64-bit RV64GC core, as a host processor.<br />
Ariane would manage Halide's frontend, while the image processing tasks would execute on 32-bit cores in the cluster.<br />
The final goal of this thesis is to have Halide programmed image processing kernels running on an HERO system implemented on an FPGA.<br />
<br />
The project can be done by as one or two semester thesis. The project consists of three parts:<br />
<br />
1. Familiarizing with the Halide language and the architecture of HERO (~2 person weeks).<br />
<br />
2. Add a RISC-V target to Halide's frontend (~3 person weeks).<br />
<br />
3. Test up the Halide environment on an FPGA with a set of custom image processing kernels (~1 person week)<br />
<br />
4. Documentation and report writing (~1 person week)<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowlegde of the C++ programming language<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Completed ===<br />
<br />
* Student: Pierre-Hugues Blelly<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]], [[:User:Akurth | Andreas Kurth]]<br />
<br />
=== Professor ===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
== Meetings & Presentations ==<br />
<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details refer to [http://eda.ee.ethz.ch/index.php/Design_review (1)].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS Colloquium.<br />
<br />
== References ==<br />
<br />
# Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA. CARRV' 2017. [https://doi.org/10.3929/ethz-b-000219249 link]<br />
# Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, Frédo Durand. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. SIGGRAPH 2012. [http://people.csail.mit.edu/jrk/halide12 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Completed]]<br />
[[Category:Semester Thesis]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Andreasd]]<br />
[[Category:Heterogeneous_Acceleration_Systems]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Heterogeneous_System_for_Image_Processing_on_an_FPGA_(M)&diff=5654Implementation of a Heterogeneous System for Image Processing on an FPGA (M)2020-11-02T15:26:49Z<p>Matheusd: Matheusd moved page Implementation of a Heterogeneous System for Image Processing on an FPGA (M) to Implementation of a Heterogeneous System for Image Processing on an FPGA (S)</p>
<hr />
<div>#REDIRECT [[Implementation of a Heterogeneous System for Image Processing on an FPGA (S)]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=User:Matheusd&diff=5638User:Matheusd2020-11-02T12:16:42Z<p>Matheusd: </p>
<hr />
<div>== Matheus Cavalcante ==<br />
<br />
[[File:Matheusd_face_1to1.png|thumb|200px|]]<br />
<br />
* '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
[[Category:Digital]]<br />
<br />
I received my M.Sc. in integrated electronic systems from the Grenoble INP (Phelma) in 2018. I am currently pursuing a Ph.D. degree under the Digital Circuits and Systems group of Prof. Luca Benini. <br />
<br />
My current research interests include:<br />
<br />
* Computer and System Architecture<br />
* High-Performance Computing<br />
* Vector Processing<br />
* Interconnection Networks<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In progress<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
category = Completed<br />
category = Matheusd<br />
suppresserrors=true<br />
</DynamicPageList></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Heterogeneous_System_for_Image_Processing_on_an_FPGA&diff=5610Implementation of a Heterogeneous System for Image Processing on an FPGA2020-11-02T10:18:48Z<p>Matheusd: </p>
<hr />
<div>== Introduction ==<br />
<br />
Heterogeneous systems combine a general-purpose host processor with domain-specific Programmable Many-Core Accelerators (PMCAs).<br />
Such systems are highly versatile, due to their host processor capabilities, while having high performance and energy efficiency through their PMCAs.<br />
HERO is a FPGA-based research platform developed at IIS that combines a PMCA composed by RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor.<br />
<br />
Heterogeneous systems have a complex programming model, which lead to significant effort to develop tools to retain a high programmer productivity.<br />
Halide is domain specific programming language designed to write fast image processing algorithms.<br />
More specifically, it is a C++ dialect with a functional programming paradigm.<br />
It's aim is to separate the function applied to the image (pipeline), and the sequence in which the algorithm is executed (schedule).<br />
For example, the schedule encompasses how the algorithm is parallelized, if the image is tiled, processed in column or row major order, if solutions required by multiple threads are shared or recomputed, if parts of the computation is offloaded to an accelerator, and so on.<br />
This allows a programmer to write a functional description of the image processing algorithm and then explore ways of scheduling the execution with only a couple of lines of code, and without modifying the algorithm.<br />
Furthermore, the same algorithm can be run efficiently on multiple different architectures by only changing the schedule.<br />
To have Halide generate efficient code, the specific architecture requires to have an efficient Halide runtime implementation, and good compiler support, as Halide is tightly coupled with the compiler.<br />
<br />
== Project description ==<br />
[[File:HalideLang.png|thumb|600px]]<br />
<br />
The goal of this project is to bring up Halide on HERO, using Ariane, a 64-bit RV64GC core, as a host processor.<br />
Ariane would manage Halide's frontend, while the image processing tasks would execute on 32-bit cores in the cluster.<br />
The final goal of this thesis is to have Halide programmed image processing kernels running on an HERO system implemented on an FPGA.<br />
<br />
The project can be done by as one or two semester thesis. The project consists of three parts:<br />
<br />
1. Familiarizing with the Halide language and the architecture of HERO (~2 person weeks).<br />
<br />
2. Add a RISC-V target to Halide's frontend (~3 person weeks).<br />
<br />
3. Test up the Halide environment on an FPGA with a set of custom image processing kernels (~1 person week)<br />
<br />
4. Documentation and report writing (~1 person week)<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowlegde of the C++ programming language<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Completed ===<br />
<br />
* Student: Pierre-Hugues Blelly<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]], [[:User:Akurth | Andreas Kurth]]<br />
<br />
=== Professor ===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
== Meetings & Presentations ==<br />
<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details refer to [http://eda.ee.ethz.ch/index.php/Design_review (1)].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS Colloquium.<br />
<br />
== References ==<br />
<br />
# Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA. CARRV' 2017. [https://doi.org/10.3929/ethz-b-000219249 link]<br />
# Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, Frédo Durand. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. SIGGRAPH 2012. [http://people.csail.mit.edu/jrk/halide12 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Completed]]<br />
[[Category:Semester Thesis]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Andreasd]]<br />
[[Category:Heterogeneous_Acceleration_Systems]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Heterogeneous_System_for_Image_Processing_on_an_FPGA&diff=5606Implementation of a Heterogeneous System for Image Processing on an FPGA2020-11-02T10:17:14Z<p>Matheusd: </p>
<hr />
<div>== Introduction ==<br />
<br />
Heterogeneous systems combine a general-purpose host processor with domain-specific Programmable Many-Core Accelerators (PMCAs).<br />
Such systems are highly versatile, due to their host processor capabilities, while having high performance and energy efficiency through their PMCAs.<br />
HERO is a FPGA-based research platform developed at IIS that combines a PMCA composed by RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor.<br />
<br />
Heterogeneous systems have a complex programming model, which lead to significant effort to develop tools to retain a high programmer productivity.<br />
Halide is domain specific programming language designed to write fast image processing algorithms.<br />
More specifically, it is a C++ dialect with a functional programming paradigm.<br />
It's aim is to separate the function applied to the image (pipeline), and the sequence in which the algorithm is executed (schedule).<br />
For example, the schedule encompasses how the algorithm is parallelized, if the image is tiled, processed in column or row major order, if solutions required by multiple threads are shared or recomputed, if parts of the computation is offloaded to an accelerator, and so on.<br />
This allows a programmer to write a functional description of the image processing algorithm and then explore ways of scheduling the execution with only a couple of lines of code, and without modifying the algorithm.<br />
Furthermore, the same algorithm can be run efficiently on multiple different architectures by only changing the schedule.<br />
To have Halide generate efficient code, the specific architecture requires to have an efficient Halide runtime implementation, and good compiler support, as Halide is tightly coupled with the compiler.<br />
<br />
== Project description ==<br />
[[File:HalideLang.png|thumb|600px]]<br />
<br />
The goal of this project is to bring up Halide on HERO, using Ariane, a 64-bit RV64GC core, as a host processor.<br />
Ariane would manage Halide's frontend, while the image processing tasks would execute on 32-bit cores in the cluster.<br />
The final goal of this thesis is to have Halide programmed image processing kernels running on an HERO system implemented on an FPGA.<br />
<br />
The project can be done by as one or two semester thesis. The project consists of three parts:<br />
<br />
1. Familiarizing with the Halide language and the architecture of HERO (~2 person weeks).<br />
<br />
2. Add a RISC-V target to Halide's frontend (~3 person weeks).<br />
<br />
3. Test up the Halide environment on an FPGA with a set of custom image processing kernels (~1 person week)<br />
<br />
4. Documentation and report writing (~1 person week)<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowlegde of the C++ programming language<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: In progress ===<br />
<br />
* Student: Pierre-Hugues Blelly<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]], [[:User:Akurth | Andreas Kurth]]<br />
<br />
=== Professor ===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
== Meetings & Presentations ==<br />
<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details refer to [http://eda.ee.ethz.ch/index.php/Design_review (1)].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS Colloquium.<br />
<br />
== References ==<br />
<br />
# Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA. CARRV' 2017. [https://doi.org/10.3929/ethz-b-000219249 link]<br />
# Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, Frédo Durand. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. SIGGRAPH 2012. [http://people.csail.mit.edu/jrk/halide12 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:In_progress]]<br />
[[Category:Semester Thesis]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Andreasd]]<br />
[[Category:Heterogeneous_Acceleration_Systems]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=High_Performance_SoCs&diff=5527High Performance SoCs2020-10-29T12:48:56Z<p>Matheusd: /* Who are we */</p>
<hr />
<div>==High-Performance Systems-on-Chip==<br />
<br />
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]<br />
[[File:Floorplan_baikonur.png|thumb|350px|''Baikonur'', a 22 nm chip integrating two application-grade RISC-V Ariane cores and 3 Snitch clusters with 8 cores each.]]<br />
[[File:Manticore_concept.png|thumb|350px|Concept art for ''Manticore'', a Snitch-based 22 nm system with 4096 cores on multiple chiplets and with HBM2 memory.]]<br />
<br />
Today, a multitude of data-driven applications such as machine learning, scientific computing, and big data demand an ever-increasing amount of '''parallel floating-point performance''' from computing systems. Increasingly, such applications must scale across a wide range of applications and energy budgets, from supercomputers simulating next week's weather to your smartphone cameras correcting for low light conditions.<br />
<br />
This brings challenges on multiple fronts:<br />
<br />
* '''Energy Efficiency''' becomes a major concern: As logic density increases, supplying these systems with energy and managing their heat dissipation requires increasingly complex solutions.<br />
<br />
* '''Memory bandwidth and latency''' become a major bottleneck as the amount of processed data increases. Despite continuous advances, memory lags behind computing in scaling, and many data-driven problems today are memory-bound.<br />
<br />
* '''Parallelization and scaling''' bring challenges of their own: on-chip interconnects may introduce significant area and performance overheads as they grow, and both the data and instruction streams of cores may compete for valuable memory bandwidth and interfere in a destructive way.<br />
<br />
While all state-of-the-art high-performance computing systems are constrained by the above issues, they are also subject to a fundamental trade-off between efficiency and flexibility. This forms a design space which includes the following paradigms:<br />
<br />
* '''Accelerators''' are designed to do one thing very well: they are very energy efficient and performant and usually offer predetermined data movement. However, they are not or barely programmable, inflexible, and monolithic in their design.<br />
<br />
* '''Superscalar Out-of-Order CPUs''', on the other end, provide extreme flexibility, full programmability, and reasonable performance across various workloads. However, they require large area and energy overheads for a given performance, use memory inefficiently, and are often hard to scale well to manycore systems.<br />
<br />
* '''GPUs''' are parallel and data-oriented by design, yet still meaningfully programmable, aiming for a sweet-spot between scalability, efficiency, and programmability. However, are still subject to memory access challenges and often require manual memory management for decent performance.<br />
<br />
'''How can we further improve on these existing paradigms?''' Can we design decently efficient and performant, yet freely programmable systems with scalable, performant memory systems?<br />
<br />
If these questions sound intriguing to you, consider joining us for a project or thesis! You can find currently available projects and our contact information below.<br />
<br />
==Our Activities==<br />
<br />
We are primarily interested in '''architecture design and hardware implementation''' for high-performance systems. However, ensuring high performance requires us to consider the '''entire hardware-software stack''':<br />
<br />
* '''HPC Software''': Design and porting of high-performance applications, benchmarks, compiler tools, and operating systems (Linux) to our hardware.<br />
* '''Hardware-software codesign''': Design of performance-aware algorithms and kernels and hardware that can be efficiently programmed for use in processor-based systems.<br />
* '''Architecture''': RTL implementation of energy-efficient designs with an emphasis on high utilization and throughput, as well as on efficient interoperability with existing IPs.<br />
* '''SoC design and Implementation''': Design of full high-performance systems-on-chips; implementation and tapeout on modern silicon technologies such as TSMC's 65 nm and GlobalFoundries' 22 nm nodes.<br />
* '''IC testing and Board-Level design''': Testing of the returning chips with industry-grade automated test equipment (ATE) and design of system-level demonstrator boards.<br />
<br />
Our current interests include systems with '''low control-to-compute ratios''', high-performance '''on-chip interconnects''', and '''scalable many-core systems'''. However, we are always happy to explore new domains; if you have an interesting idea, contact us and we can discuss it in detail!<br />
<br />
==Who are we==<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Paulsc_face_1to1.png|frameless|left|96px]]<br />
|<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
|}<br />
<br />
===[[:User:Tbenz | Thomas Benz]]===<br />
* '''e-mail''': [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 05 18<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Sriedel | Samuel Riedel]]===<br />
* '''e-mail''': [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 65 69<br />
* '''office''': ETZ J71.2<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Matheusd_face_1to1.png|frameless|left|96px]]<br />
|<br />
===[[:User:Matheusd | Matheus Cavalcante]]===<br />
* '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 96<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
===[[:User:Akurth | Andreas Kurth]]===<br />
* '''e-mail''': [mailto:akurth@iis.ee.ethz.ch akurth@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 04 87<br />
* '''office''': ETZ J69.2<br />
<br />
===[[:User:Zarubaf | Florian Zaruba]]===<br />
* '''e-mail''': [mailto:zarubaf@iis.ee.ethz.ch zarubaf@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 65 56<br />
* '''office''': ETZ J89<br />
<br />
===[[:User:Fschuiki | Fabian Schuiki]]===<br />
* '''e-mail''': [mailto:fschuiki@iis.ee.ethz.ch fschuiki@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 67 89<br />
* '''office''': ETZ J89<br />
<br />
<!-- ===[[:User:Balasr | Robert Balas]]=== --> <!-- TODO @balasr --><br />
===Robert Balas===<br />
* '''e-mail''': [mailto:balasr@iis.ee.ethz.ch balasr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 42 56<br />
* '''office''': ETZ J78<br />
<br />
<!--<br />
Who are we<br />
What do we do<br />
Where to find us<br />
--><br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
</DynamicPageList></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=User:Matheusd&diff=5526User:Matheusd2020-10-29T12:47:16Z<p>Matheusd: </p>
<hr />
<div>== Matheus Cavalcante ==<br />
<br />
[[File:Matheusd_face_1to1.png|thumb|200px|]]<br />
<br />
* '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
[[Category:Digital]]<br />
<br />
I received my M.Sc. in integrated electronic systems from the Grenoble INP (Phelma) 2018. I am currently pursuing a Ph.D. degree under the Digital Circuits and Systems group of Prof. Luca Benini. <br />
<br />
My current research interests include:<br />
<br />
* Computer and System Architecture<br />
* High-Performance Computing<br />
* Vector Processing<br />
* Interconnection Networks<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In progress<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
category = Completed<br />
category = Matheusd<br />
suppresserrors=true<br />
</DynamicPageList></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=File:Matheusd_face_1to1.png&diff=5525File:Matheusd face 1to1.png2020-10-29T12:46:56Z<p>Matheusd: </p>
<hr />
<div></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=User:Matheusd&diff=5524User:Matheusd2020-10-29T12:46:17Z<p>Matheusd: </p>
<hr />
<div>== Matheus Cavalcante ==<br />
<br />
[[File:Matheusd_face_1to1.png|thumb|200px|]]<br />
<br />
* '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
[[Category:Digital]]<br />
<br />
I received my M.Sc in integrated electronic systems from the Grenoble INP (Phelma) 2018. I am currently pursuing a Ph.D. degree under the Digital Circuits and Systems group of Prof. Luca Benini. <br />
<br />
My current research interests include:<br />
<br />
* Computer and System Architecture<br />
* High Performance Computing<br />
* Vector Processing<br />
* Interconnection Networks<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In progress<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
category = Completed<br />
category = Matheusd<br />
suppresserrors=true<br />
</DynamicPageList></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=User:Matheusd&diff=5523User:Matheusd2020-10-29T12:43:09Z<p>Matheusd: /* Available Projects */</p>
<hr />
<div>== Matheus Cavalcante ==<br />
<br />
* '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
[[Category:Digital]]<br />
<br />
==Interests==<br />
* Computer and System Architecture<br />
* High Performance Computing<br />
* Vector Processing<br />
* Interconnection Networks<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
==Projects In Progress==<br />
<DynamicPageList><br />
category = In progress<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
==Completed Projects==<br />
<DynamicPageList><br />
category = Completed<br />
category = Matheusd<br />
suppresserrors=true<br />
</DynamicPageList></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=User:Matheusd&diff=5522User:Matheusd2020-10-29T12:42:57Z<p>Matheusd: </p>
<hr />
<div>== Matheus Cavalcante ==<br />
<br />
* '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
[[Category:Digital]]<br />
<br />
==Interests==<br />
* Computer and System Architecture<br />
* High Performance Computing<br />
* Vector Processing<br />
* Interconnection Networks<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Matheusd<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Matheusd<br />
suppresserrors=true<br />
</DynamicPageList></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Ara:_Update_PULP%27s_Vector_Processor_with_the_recent_RISC-V_Vector_Extension_Development&diff=5521Ara: Update PULP's Vector Processor with the recent RISC-V Vector Extension Development2020-10-29T12:42:22Z<p>Matheusd: </p>
<hr />
<div>== Introduction ==<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB).<br />
This is related with the memory traffic required to the instruction fetch.<br />
Multi-core designs, although highly flexible, do not explore the regularity of regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest on vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general purpose processor for applications that fit its execution model.<br />
The renewed interest in vector processing is reflected by the instruction of vector instruction extensions in all popular Instruction Set Architectures, such as ARM's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on the version 0.5-Draft of the RISC-V vector extension.<br />
The vector unit was design for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating point operands.<br />
<br />
== Project description ==<br />
<br />
Since Ara has been published, new versions of the RISC-V V Extension have been published.<br />
The goal of this project is to update Ara so that it is compliant with the newest specifications.<br />
<br />
The project can be done by as two semester thesis or a Master's thesis. The project consists of the following parts:<br />
<br />
1. Familiarizing with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Update Ariane's frontend, so that it decodes the new vector instructions. (~2 person weeks)<br />
<br />
3. Update Ara's backend with the updated instructions. (~4 person weeks)<br />
<br />
4. Validate the design by co-simulating it with a RISC-V Simulator. (~2 person weeks)<br />
<br />
5. Documentation and report writing (~2 person week)<br />
<br />
Depending on timing constraints and if the student(s) are interested, a tape-out of the updated design might be feasible.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI I course is highly recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Completed ===<br />
<br />
* Looking for one or two semester projects<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]]<br />
<br />
=== Professor ===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
== Meetings & Presentations ==<br />
<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details refer to [http://eda.ee.ethz.ch/index.php/Design_review (1)].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS Colloquium.<br />
<br />
== References ==<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Completed]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Computer Architecture]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Ara:_Update_PULP%27s_Vector_Processor_with_the_recent_RISC-V_Vector_Extension_Development&diff=5520Ara: Update PULP's Vector Processor with the recent RISC-V Vector Extension Development2020-10-29T12:41:09Z<p>Matheusd: </p>
<hr />
<div>== Introduction ==<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB).<br />
This is related with the memory traffic required to the instruction fetch.<br />
Multi-core designs, although highly flexible, do not explore the regularity of regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest on vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general purpose processor for applications that fit its execution model.<br />
The renewed interest in vector processing is reflected by the instruction of vector instruction extensions in all popular Instruction Set Architectures, such as ARM's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on the version 0.5-Draft of the RISC-V vector extension.<br />
The vector unit was design for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating point operands.<br />
<br />
== Project description ==<br />
<br />
Since Ara has been published, new versions of the RISC-V V Extension have been published.<br />
The goal of this project is to update Ara so that it is compliant with the newest specifications.<br />
<br />
The project can be done by as two semester thesis or a Master's thesis. The project consists of the following parts:<br />
<br />
1. Familiarizing with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Update Ariane's frontend, so that it decodes the new vector instructions. (~2 person weeks)<br />
<br />
3. Update Ara's backend with the updated instructions. (~4 person weeks)<br />
<br />
4. Validate the design by co-simulating it with a RISC-V Simulator. (~2 person weeks)<br />
<br />
5. Documentation and report writing (~2 person week)<br />
<br />
Depending on timing constraints and if the student(s) are interested, a tape-out of the updated design might be feasible.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI I course is highly recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Completed ===<br />
<br />
* Looking for one or two semester projects<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]]<br />
<br />
=== Professor ===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
== Meetings & Presentations ==<br />
<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details refer to [http://eda.ee.ethz.ch/index.php/Design_review (1)].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS Colloquium.<br />
<br />
== References ==<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Computer Architecture]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=High_Performance_SoCs&diff=5519High Performance SoCs2020-10-29T12:40:37Z<p>Matheusd: /* Matheus De Araujo Cavalcante */</p>
<hr />
<div>==High-Performance Systems-on-Chip==<br />
<br />
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]<br />
[[File:Floorplan_baikonur.png|thumb|350px|''Baikonur'', a 22 nm chip integrating two application-grade RISC-V Ariane cores and 3 Snitch clusters with 8 cores each.]]<br />
[[File:Manticore_concept.png|thumb|350px|Concept art for ''Manticore'', a Snitch-based 22 nm system with 4096 cores on multiple chiplets and with HBM2 memory.]]<br />
<br />
Today, a multitude of data-driven applications such as machine learning, scientific computing, and big data demand an ever-increasing amount of '''parallel floating-point performance''' from computing systems. Increasingly, such applications must scale across a wide range of applications and energy budgets, from supercomputers simulating next week's weather to your smartphone cameras correcting for low light conditions.<br />
<br />
This brings challenges on multiple fronts:<br />
<br />
* '''Energy Efficiency''' becomes a major concern: As logic density increases, supplying these systems with energy and managing their heat dissipation requires increasingly complex solutions.<br />
<br />
* '''Memory bandwidth and latency''' become a major bottleneck as the amount of processed data increases. Despite continuous advances, memory lags behind computing in scaling, and many data-driven problems today are memory-bound.<br />
<br />
* '''Parallelization and scaling''' bring challenges of their own: on-chip interconnects may introduce significant area and performance overheads as they grow, and both the data and instruction streams of cores may compete for valuable memory bandwidth and interfere in a destructive way.<br />
<br />
While all state-of-the-art high-performance computing systems are constrained by the above issues, they are also subject to a fundamental trade-off between efficiency and flexibility. This forms a design space which includes the following paradigms:<br />
<br />
* '''Accelerators''' are designed to do one thing very well: they are very energy efficient and performant and usually offer predetermined data movement. However, they are not or barely programmable, inflexible, and monolithic in their design.<br />
<br />
* '''Superscalar Out-of-Order CPUs''', on the other end, provide extreme flexibility, full programmability, and reasonable performance across various workloads. However, they require large area and energy overheads for a given performance, use memory inefficiently, and are often hard to scale well to manycore systems.<br />
<br />
* '''GPUs''' are parallel and data-oriented by design, yet still meaningfully programmable, aiming for a sweet-spot between scalability, efficiency, and programmability. However, are still subject to memory access challenges and often require manual memory management for decent performance.<br />
<br />
'''How can we further improve on these existing paradigms?''' Can we design decently efficient and performant, yet freely programmable systems with scalable, performant memory systems?<br />
<br />
If these questions sound intriguing to you, consider joining us for a project or thesis! You can find currently available projects and our contact information below.<br />
<br />
==Our Activities==<br />
<br />
We are primarily interested in '''architecture design and hardware implementation''' for high-performance systems. However, ensuring high performance requires us to consider the '''entire hardware-software stack''':<br />
<br />
* '''HPC Software''': Design and porting of high-performance applications, benchmarks, compiler tools, and operating systems (Linux) to our hardware.<br />
* '''Hardware-software codesign''': Design of performance-aware algorithms and kernels and hardware that can be efficiently programmed for use in processor-based systems.<br />
* '''Architecture''': RTL implementation of energy-efficient designs with an emphasis on high utilization and throughput, as well as on efficient interoperability with existing IPs.<br />
* '''SoC design and Implementation''': Design of full high-performance systems-on-chips; implementation and tapeout on modern silicon technologies such as TSMC's 65 nm and GlobalFoundries' 22 nm nodes.<br />
* '''IC testing and Board-Level design''': Testing of the returning chips with industry-grade automated test equipment (ATE) and design of system-level demonstrator boards.<br />
<br />
Our current interests include systems with '''low control-to-compute ratios''', high-performance '''on-chip interconnects''', and '''scalable many-core systems'''. However, we are always happy to explore new domains; if you have an interesting idea, contact us and we can discuss it in detail!<br />
<br />
==Who are we==<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Paulsc_face_1to1.png|frameless|left|96px]]<br />
|<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
|}<br />
<br />
===[[:User:Tbenz | Thomas Benz]]===<br />
* '''e-mail''': [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 05 18<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Sriedel | Samuel Riedel]]===<br />
* '''e-mail''': [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 65 69<br />
* '''office''': ETZ J71.2<br />
<br />
===[[:User:Matheusd | Matheus Cavalcante]]===<br />
* '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 96<br />
* '''office''': ETZ J69.2<br />
<br />
===[[:User:Akurth | Andreas Kurth]]===<br />
* '''e-mail''': [mailto:akurth@iis.ee.ethz.ch akurth@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 04 87<br />
* '''office''': ETZ J69.2<br />
<br />
===[[:User:Zarubaf | Florian Zaruba]]===<br />
* '''e-mail''': [mailto:zarubaf@iis.ee.ethz.ch zarubaf@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 65 56<br />
* '''office''': ETZ J89<br />
<br />
===[[:User:Fschuiki | Fabian Schuiki]]===<br />
* '''e-mail''': [mailto:fschuiki@iis.ee.ethz.ch fschuiki@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 67 89<br />
* '''office''': ETZ J89<br />
<br />
<!-- ===[[:User:Balasr | Robert Balas]]=== --> <!-- TODO @balasr --><br />
===Robert Balas===<br />
* '''e-mail''': [mailto:balasr@iis.ee.ethz.ch balasr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 42 56<br />
* '''office''': ETZ J78<br />
<br />
<!--<br />
Who are we<br />
What do we do<br />
Where to find us<br />
--><br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
</DynamicPageList></div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Heterogeneous_System_for_Image_Processing_on_an_FPGA&diff=5110Implementation of a Heterogeneous System for Image Processing on an FPGA2020-02-21T16:46:06Z<p>Matheusd: </p>
<hr />
<div>== Introduction ==<br />
<br />
Heterogeneous systems combine a general-purpose host processor with domain-specific Programmable Many-Core Accelerators (PMCAs).<br />
Such systems are highly versatile, due to their host processor capabilities, while having high performance and energy efficiency through their PMCAs.<br />
HERO is a FPGA-based research platform developed at IIS that combines a PMCA composed by RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor.<br />
<br />
Heterogeneous systems have a complex programming model, which lead to significant effort to develop tools to retain a high programmer productivity.<br />
Halide is domain specific programming language designed to write fast image processing algorithms.<br />
More specifically, it is a C++ dialect with a functional programming paradigm.<br />
It's aim is to separate the function applied to the image (pipeline), and the sequence in which the algorithm is executed (schedule).<br />
For example, the schedule encompasses how the algorithm is parallelized, if the image is tiled, processed in column or row major order, if solutions required by multiple threads are shared or recomputed, if parts of the computation is offloaded to an accelerator, and so on.<br />
This allows a programmer to write a functional description of the image processing algorithm and then explore ways of scheduling the execution with only a couple of lines of code, and without modifying the algorithm.<br />
Furthermore, the same algorithm can be run efficiently on multiple different architectures by only changing the schedule.<br />
To have Halide generate efficient code, the specific architecture requires to have an efficient Halide runtime implementation, and good compiler support, as Halide is tightly coupled with the compiler.<br />
<br />
== Project description ==<br />
[[File:HalideLang.png|thumb|600px]]<br />
<br />
The goal of this project is to bring up Halide on HERO, using Ariane, a 64-bit RV64GC core, as a host processor.<br />
Ariane would manage Halide's frontend, while the image processing tasks would execute on 32-bit cores in the cluster.<br />
The final goal of this thesis is to have Halide programmed image processing kernels running on an HERO system implemented on an FPGA.<br />
<br />
The project can be done by as one or two semester thesis. The project consists of three parts:<br />
<br />
1. Familiarizing with the Halide language and the architecture of HERO (~2 person weeks).<br />
<br />
2. Add a RISC-V target to Halide's frontend (~3 person weeks).<br />
<br />
3. Test up the Halide environment on an FPGA with a set of custom image processing kernels (~1 person week)<br />
<br />
4. Documentation and report writing (~1 person week)<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowlegde of the C++ programming language<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: In progress ===<br />
<br />
* Student: Pierre-Hugues Blelly<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]], [[:User:Akurth | Andreas Kurth]]<br />
<br />
=== Professor ===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
== Meetings & Presentations ==<br />
<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details refer to [http://eda.ee.ethz.ch/index.php/Design_review (1)].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS Colloquium.<br />
<br />
== References ==<br />
<br />
# Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA. CARRV' 2017. [https://doi.org/10.3929/ethz-b-000219249 link]<br />
# Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, Frédo Durand. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. SIGGRAPH 2012. [http://people.csail.mit.edu/jrk/halide12 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:In_progress]]<br />
[[Category:Semester Thesis]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Andreasd]]<br />
[[Category:Heterogeneous_Acceleration_Systems]]<br />
[[Category:Computer Architecture]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Ara:_Update_PULP%27s_Vector_Processor_with_the_recent_RISC-V_Vector_Extension_Development&diff=5097Ara: Update PULP's Vector Processor with the recent RISC-V Vector Extension Development2020-01-26T18:31:21Z<p>Matheusd: Created page with "== Introduction == In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB). This is related with the memory tra..."</p>
<hr />
<div>== Introduction ==<br />
<br />
In instruction-based programmable architectures, the key challenge is how to mitigate the Von Neumann Bottleneck (VNB).<br />
This is related with the memory traffic required to the instruction fetch.<br />
Multi-core designs, although highly flexible, do not explore the regularity of regularity of data-parallel applications.<br />
Each core tends to execute the same instruction many times, a waste in terms of both area and energy.<br />
<br />
The quest for extreme energy efficiency in data-parallel execution revamped the interest on vector architectures.<br />
Such systems promise to tackle the VNB very effectively, providing better energy efficiency than a general purpose processor for applications that fit its execution model.<br />
The renewed interest in vector processing is reflected by the instruction of vector instruction extensions in all popular Instruction Set Architectures, such as ARM's with its SVE, and RISC-V with the V Extension.<br />
<br />
Within the PULP Project, Ara is a parametric in-order high-performance 64-bit vector unit based on the version 0.5-Draft of the RISC-V vector extension.<br />
The vector unit was design for a memory bandwidth per peak performance ratio of 2B/DP-FLOP.<br />
Ara works in tandem with Ariane, an open-source application-class RV64GC scalar core.<br />
The vector unit supports mixed-precision arithmetic with double, single, and half-precision floating point operands.<br />
<br />
== Project description ==<br />
<br />
Since Ara has been published, new versions of the RISC-V V Extension have been published.<br />
The goal of this project is to update Ara so that it is compliant with the newest specifications.<br />
<br />
The project can be done by as two semester thesis or a Master's thesis. The project consists of the following parts:<br />
<br />
1. Familiarizing with the RISC-V Vector Extension and the Ara source code. (~2 person weeks)<br />
<br />
2. Update Ariane's frontend, so that it decodes the new vector instructions. (~2 person weeks)<br />
<br />
3. Update Ara's backend with the updated instructions. (~4 person weeks)<br />
<br />
4. Validate the design by co-simulating it with a RISC-V Simulator. (~2 person weeks)<br />
<br />
5. Documentation and report writing (~2 person week)<br />
<br />
Depending on timing constraints and if the student(s) are interested, a tape-out of the updated design might be feasible.<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI I course is highly recommended.<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
* Looking for one or two semester projects<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]]<br />
<br />
=== Professor ===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
== Meetings & Presentations ==<br />
<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details refer to [http://eda.ee.ethz.ch/index.php/Design_review (1)].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS Colloquium.<br />
<br />
== References ==<br />
<br />
# Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini. Ara: A 1GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22nm FD-SOI. [https://doi.org/10.1109/TVLSI.2019.2950087 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Available]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Computer Architecture]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Heterogeneous_System_for_Image_Processing_on_an_FPGA&diff=5084Implementation of a Heterogeneous System for Image Processing on an FPGA2020-01-10T12:22:46Z<p>Matheusd: </p>
<hr />
<div>== Introduction ==<br />
<br />
Heterogeneous systems combine a general-purpose host processor with domain-specific Programmable Many-Core Accelerators (PMCAs).<br />
Such systems are highly versatile, due to their host processor capabilities, while having high performance and energy efficiency through their PMCAs.<br />
HERO is a FPGA-based research platform developed at IIS that combines a PMCA composed by RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor.<br />
<br />
Heterogeneous systems have a complex programming model, which lead to significant effort to develop tools to retain a high programmer productivity.<br />
Halide is domain specific programming language designed to write fast image processing algorithms.<br />
More specifically, it is a C++ dialect with a functional programming paradigm.<br />
It's aim is to separate the function applied to the image (pipeline), and the sequence in which the algorithm is executed (schedule).<br />
For example, the schedule encompasses how the algorithm is parallelized, if the image is tiled, processed in column or row major order, if solutions required by multiple threads are shared or recomputed, if parts of the computation is offloaded to an accelerator, and so on.<br />
This allows a programmer to write a functional description of the image processing algorithm and then explore ways of scheduling the execution with only a couple of lines of code, and without modifying the algorithm.<br />
Furthermore, the same algorithm can be run efficiently on multiple different architectures by only changing the schedule.<br />
To have Halide generate efficient code, the specific architecture requires to have an efficient Halide runtime implementation, and good compiler support, as Halide is tightly coupled with the compiler.<br />
<br />
== Project description ==<br />
[[File:HalideLang.png|thumb|600px]]<br />
<br />
The goal of this project is to bring up Halide on HERO, using Ariane, a 64-bit RV64GC core, as a host processor.<br />
Ariane would manage Halide's frontend, while the image processing tasks would execute on 32-bit cores in the cluster.<br />
The final goal of this thesis is to have Halide programmed image processing kernels running on an HERO system implemented on an FPGA.<br />
<br />
The project can be done by as one or two semester thesis. The project consists of three parts:<br />
<br />
1. Familiarizing with the Halide language and the architecture of HERO (~2 person weeks).<br />
<br />
2. Add a RISC-V target to Halide's frontend (~3 person weeks).<br />
<br />
3. Test up the Halide environment on an FPGA with a set of custom image processing kernels (~1 person week)<br />
<br />
4. Documentation and report writing (~1 person week)<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowlegde of the C++ programming language<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
* Looking for one or two semester projects<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]], [[:User:Akurth | Andreas Kurth]]<br />
<br />
=== Professor ===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
== Meetings & Presentations ==<br />
<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details refer to [http://eda.ee.ethz.ch/index.php/Design_review (1)].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS Colloquium.<br />
<br />
== References ==<br />
<br />
# Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA. CARRV' 2017. [https://doi.org/10.3929/ethz-b-000219249 link]<br />
# Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, Frédo Durand. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. SIGGRAPH 2012. [http://people.csail.mit.edu/jrk/halide12 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Available]]<br />
[[Category:Semester Thesis]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Andreasd]]<br />
[[Category:Heterogeneous_Acceleration_Systems]]<br />
[[Category:Computer Architecture]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_a_Heterogeneous_System_for_Image_Processing_on_an_FPGA&diff=5083Implementation of a Heterogeneous System for Image Processing on an FPGA2020-01-10T11:42:13Z<p>Matheusd: Created page with "== Introduction == Heterogeneous systems combine a general-purpose host processor with domain-specific Programmable Many-Core Accelerators (PMCAs). Such systems are highly ve..."</p>
<hr />
<div>== Introduction ==<br />
<br />
Heterogeneous systems combine a general-purpose host processor with domain-specific Programmable Many-Core Accelerators (PMCAs).<br />
Such systems are highly versatile, due to their host processor capabilities, while having high performance and energy efficiency through their PMCAs.<br />
HERO is a FPGA-based research platform developed at IIS that combines a PMCA composed by RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor.<br />
<br />
Heterogeneous systems have a complex programming model, which lead to significant effort to develop tools to retain a high programmer productivity.<br />
Halide is domain specific programming language designed to write fast image processing algorithms.<br />
More specifically, it is a C++ dialect with a functional programming paradigm.<br />
It's aim is to separate the function applied to the image (pipeline), and the sequence in which the algorithm is executed (schedule).<br />
For example, the schedule encompasses how the algorithm is parallelized, if the image is tiled, processed in column or row major order, if solutions required by multiple threads are shared or recomputed, if parts of the computation is offloaded to an accelerator, and so on.<br />
This allows a programmer to write a functional description of the image processing algorithm and then explore ways of scheduling the execution with only a couple of lines of code, and without modifying the algorithm.<br />
Furthermore, the same algorithm can be run efficiently on multiple different architectures by only changing the schedule.<br />
To have Halide generate efficient code, the specific architecture requires to have an efficient Halide runtime implementation, and good compiler support, as Halide is tightly coupled with the compiler.<br />
<br />
== Project description ==<br />
[[File:HalideLang.png|thumb|600px]]<br />
<br />
The goal of this project is to bring up Halide on HERO, using Ariane, a 64-bit RV64GC core, as a host processor.<br />
Ariane would manage Halide's frontend, while the image processing tasks would execute on 32-bit cores in the cluster.<br />
The final goal of this thesis is to have Halide programmed image processing kernels running on an HERO system implemented on an FPGA.<br />
<br />
The project can be done by as one or two semester thesis. The project consists of three parts:<br />
<br />
1. Familiarizing with the Halide language and the architecture of HERO (~2 person weeks).<br />
<br />
2. Add a RISC-V target to Halide's frontend (~3 person weeks).<br />
<br />
3. Test up the Halide environment on an FPGA with a set of custom image processing kernels (~1 person week)<br />
<br />
4. Documentation and report writing (~1 person week)<br />
<br />
== Required skills ==<br />
<br />
To work on this project, you will need:<br />
<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL). Having followed the VLSI 1 course is recommended.<br />
* to have prior knowlegde of the C++ programming language<br />
* to have prior knowledge of hardware design and computer architecture<br />
* to be motivated to work hard on a super cool open-source project<br />
<br />
=== Status: Available ===<br />
<br />
* Looking for a semester project<br />
* Supervision: [[:User:Matheusd|Matheus Cavalcante]], [[:User:Sriedel|Samuel Riedel]], [[:User:Akurth | Andreas Kurth]]<br />
<br />
=== Professor ===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
== Meetings & Presentations ==<br />
<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details refer to [http://eda.ee.ethz.ch/index.php/Design_review (1)].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS Colloquium.<br />
<br />
== References ==<br />
<br />
# Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA. CARRV' 2017. [https://doi.org/10.3929/ethz-b-000219249 link]<br />
# Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, Frédo Durand. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. SIGGRAPH 2012. [http://people.csail.mit.edu/jrk/halide12 link]<br />
<br />
[[#top|↑ top]]<br />
[[Category:Digital]]<br />
[[Category:Available]]<br />
[[Category:Semester Thesis]]<br />
[[Category:PULP]]<br />
[[Category:Matheusd]]<br />
[[Category:Sriedel]]<br />
[[Category:Andreasd]]<br />
[[Category:Heterogeneous_Acceleration_Systems]]<br />
[[Category:Computer Architecture]]</div>Matheusdhttp://iis-projects.ee.ethz.ch/index.php?title=File:HalideLang.png&diff=5082File:HalideLang.png2020-01-10T11:41:51Z<p>Matheusd: </p>
<hr />
<div></div>Matheusd