http://iis-projects.ee.ethz.ch/api.php?action=feedcontributions&user=Smazzola&feedformat=atomiis-projects - User contributions [en]2024-03-29T08:05:02ZUser contributionsMediaWiki 1.28.0http://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=10235User:Smazzola2024-03-04T10:02:43Z<p>Smazzola: </p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
'''Completed Projects'''<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
'''Archived'''<br />
<DynamicPageList><br />
category = Archived<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Towards_a_Technology-independent_and_Synthesizable_AXI4_Performance_Monitoring_and_Throttling_Unit_(1-2S)&diff=8226Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S)2022-10-27T11:58:02Z<p>Smazzola: /* Introduction */</p>
<hr />
<div><!-- Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S) --><br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Filippo Svelto<br />
* Semester: Fall Semester 2022<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Tbenz | Thomas Benz]] [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 5% Literature/architecture review<br />
* 65% RTL implementation<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:In Progress]]<br />
<br />
= Introduction =<br />
<br />
At IIS we design, develop, and maintain the de-facto only usable free and open-source AMBA AXI4 implementation. So far we focused on creating synthesizable interconnect IPs and non-synthesizable verification and monitoring IPs. This approach works well to create the systems (and implement them) and do most of the evaluation in simulation. It no longer works once the need arises:<br />
<br />
* to evaluate (large) systems on FPGAs and/or ASICs.<br />
* to have a synthesizable fully configurable memory system.<br />
* to monitor key figures of merit online and in-system.<br />
<br />
To summarize; we need platform-independent, synthesizable, low-overhead AXI4 monitor and throttling unit(s).<br />
<br />
= Project Description =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
<br />
* Familiarize yourself with the AXI4 protocol and the PULP AXI4 implementation<br />
* Create an extensive list of key figures of merit to monitor and compile a list of events required to track them.<br />
* Design and implement such an AXI4 event unit, evaluate it to prove it is working, and synthesize it in an ASIC technology to investigate the area/timing/power overheads.<br />
<br />
<br />
'''Stretch Goals'''<br />
<br />
Should the above milestones be reached earlier than expected and you are motivated to do further work, we propose the following stretch goals to aim for:<br />
<br />
* Create a performance counter unit allowing you to track your events.<br />
* Create a throttling unit that limits the number of outstanding transfers downstream as well as introduces a varying amount of per-channel delay.</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Towards_a_Technology-independent_and_Synthesizable_AXI4_Performance_Monitoring_and_Throttling_Unit_(1-2S)&diff=8225Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S)2022-10-27T11:57:55Z<p>Smazzola: /* Introduction */</p>
<hr />
<div><!-- Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S) --><br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Filippo Svelto<br />
* Semester: Fall Semester 2022<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Tbenz | Thomas Benz]] [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 5% Literature/architecture review<br />
* 65% RTL implementation<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:In Progress]]<br />
<br />
= Introduction =<br />
<br />
At IIS we design, develop, and maintain the de-facto only usable free and open-source AMBA AXI4 implementation. So far we focused on creating synthesizable interconnect IPs and non-synthesizable verification and monitoring IPs. This approach works well to create the systems (and implement them) and do most of the evaluation in simulation. It no longer works once the need arises:<br />
<br />
1* to evaluate (large) systems on FPGAs and/or ASICs.<br />
*2 to have a synthesizable fully configurable memory system.<br />
*3 to monitor key figures of merit online and in-system.<br />
<br />
To summarize; we need platform-independent, synthesizable, low-overhead AXI4 monitor and throttling unit(s).<br />
<br />
= Project Description =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
<br />
* Familiarize yourself with the AXI4 protocol and the PULP AXI4 implementation<br />
* Create an extensive list of key figures of merit to monitor and compile a list of events required to track them.<br />
* Design and implement such an AXI4 event unit, evaluate it to prove it is working, and synthesize it in an ASIC technology to investigate the area/timing/power overheads.<br />
<br />
<br />
'''Stretch Goals'''<br />
<br />
Should the above milestones be reached earlier than expected and you are motivated to do further work, we propose the following stretch goals to aim for:<br />
<br />
* Create a performance counter unit allowing you to track your events.<br />
* Create a throttling unit that limits the number of outstanding transfers downstream as well as introduces a varying amount of per-channel delay.</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Towards_a_Technology-independent_and_Synthesizable_AXI4_Performance_Monitoring_and_Throttling_Unit_(1-2S)&diff=8224Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S)2022-10-27T11:57:46Z<p>Smazzola: /* Introduction */</p>
<hr />
<div><!-- Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S) --><br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Filippo Svelto<br />
* Semester: Fall Semester 2022<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Tbenz | Thomas Benz]] [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 5% Literature/architecture review<br />
* 65% RTL implementation<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:In Progress]]<br />
<br />
= Introduction =<br />
<br />
At IIS we design, develop, and maintain the de-facto only usable free and open-source AMBA AXI4 implementation. So far we focused on creating synthesizable interconnect IPs and non-synthesizable verification and monitoring IPs. This approach works well to create the systems (and implement them) and do most of the evaluation in simulation. It no longer works once the need arises:<br />
<br />
*1 to evaluate (large) systems on FPGAs and/or ASICs.<br />
*2 to have a synthesizable fully configurable memory system.<br />
*3 to monitor key figures of merit online and in-system.<br />
<br />
To summarize; we need platform-independent, synthesizable, low-overhead AXI4 monitor and throttling unit(s).<br />
<br />
= Project Description =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
<br />
* Familiarize yourself with the AXI4 protocol and the PULP AXI4 implementation<br />
* Create an extensive list of key figures of merit to monitor and compile a list of events required to track them.<br />
* Design and implement such an AXI4 event unit, evaluate it to prove it is working, and synthesize it in an ASIC technology to investigate the area/timing/power overheads.<br />
<br />
<br />
'''Stretch Goals'''<br />
<br />
Should the above milestones be reached earlier than expected and you are motivated to do further work, we propose the following stretch goals to aim for:<br />
<br />
* Create a performance counter unit allowing you to track your events.<br />
* Create a throttling unit that limits the number of outstanding transfers downstream as well as introduces a varying amount of per-channel delay.</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Towards_a_Technology-independent_and_Synthesizable_AXI4_Performance_Monitoring_and_Throttling_Unit_(1-2S)&diff=8223Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S)2022-10-27T11:57:23Z<p>Smazzola: /* Project Description */</p>
<hr />
<div><!-- Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S) --><br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Filippo Svelto<br />
* Semester: Fall Semester 2022<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Tbenz | Thomas Benz]] [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 5% Literature/architecture review<br />
* 65% RTL implementation<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:In Progress]]<br />
<br />
= Introduction =<br />
<br />
At IIS we design, develop, and maintain the de-facto only usable free and open-source AMBA AXI4 implementation. So far we focused on creating synthesizable interconnect IPs and non-synthesizable verification and monitoring IPs. This approach works well to create the systems (and implement them) and do most of the evaluation in simulation. It no longer works once the need arises:<br />
1. to evaluate (large) systems on FPGAs and/or ASICs.<br />
2. to have a synthesizable fully configurable memory system.<br />
3. to monitor key figures of merit online and in-system.<br />
<br />
To summarize; we need platform-independent, synthesizable, low-overhead AXI4 monitor and throttling unit(s).<br />
<br />
= Project Description =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
<br />
* Familiarize yourself with the AXI4 protocol and the PULP AXI4 implementation<br />
* Create an extensive list of key figures of merit to monitor and compile a list of events required to track them.<br />
* Design and implement such an AXI4 event unit, evaluate it to prove it is working, and synthesize it in an ASIC technology to investigate the area/timing/power overheads.<br />
<br />
<br />
'''Stretch Goals'''<br />
<br />
Should the above milestones be reached earlier than expected and you are motivated to do further work, we propose the following stretch goals to aim for:<br />
<br />
* Create a performance counter unit allowing you to track your events.<br />
* Create a throttling unit that limits the number of outstanding transfers downstream as well as introduces a varying amount of per-channel delay.</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Towards_a_Technology-independent_and_Synthesizable_AXI4_Performance_Monitoring_and_Throttling_Unit_(1-2S)&diff=8222Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S)2022-10-27T11:57:17Z<p>Smazzola: /* Project Description */</p>
<hr />
<div><!-- Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S) --><br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Filippo Svelto<br />
* Semester: Fall Semester 2022<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Tbenz | Thomas Benz]] [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 5% Literature/architecture review<br />
* 65% RTL implementation<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:In Progress]]<br />
<br />
= Introduction =<br />
<br />
At IIS we design, develop, and maintain the de-facto only usable free and open-source AMBA AXI4 implementation. So far we focused on creating synthesizable interconnect IPs and non-synthesizable verification and monitoring IPs. This approach works well to create the systems (and implement them) and do most of the evaluation in simulation. It no longer works once the need arises:<br />
1. to evaluate (large) systems on FPGAs and/or ASICs.<br />
2. to have a synthesizable fully configurable memory system.<br />
3. to monitor key figures of merit online and in-system.<br />
<br />
To summarize; we need platform-independent, synthesizable, low-overhead AXI4 monitor and throttling unit(s).<br />
<br />
= Project Description =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
<br />
* Familiarize yourself with the AXI4 protocol and the PULP AXI4 implementation<br />
* Create an extensive list of key figures of merit to monitor and compile a list of events required to track them.<br />
* Design and implement such an AXI4 event unit, evaluate it to prove it is working, and synthesize it in an ASIC technology to investigate the area/timing/power overheads.<br />
<br />
'''Stretch Goals'''<br />
<br />
Should the above milestones be reached earlier than expected and you are motivated to do further work, we propose the following stretch goals to aim for:<br />
<br />
* Create a performance counter unit allowing you to track your events.<br />
* Create a throttling unit that limits the number of outstanding transfers downstream as well as introduces a varying amount of per-channel delay.</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Towards_a_Technology-independent_and_Synthesizable_AXI4_Performance_Monitoring_and_Throttling_Unit_(1-2S)&diff=8221Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S)2022-10-27T11:57:08Z<p>Smazzola: /* Project Description */</p>
<hr />
<div><!-- Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S) --><br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Filippo Svelto<br />
* Semester: Fall Semester 2022<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Tbenz | Thomas Benz]] [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 5% Literature/architecture review<br />
* 65% RTL implementation<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:In Progress]]<br />
<br />
= Introduction =<br />
<br />
At IIS we design, develop, and maintain the de-facto only usable free and open-source AMBA AXI4 implementation. So far we focused on creating synthesizable interconnect IPs and non-synthesizable verification and monitoring IPs. This approach works well to create the systems (and implement them) and do most of the evaluation in simulation. It no longer works once the need arises:<br />
1. to evaluate (large) systems on FPGAs and/or ASICs.<br />
2. to have a synthesizable fully configurable memory system.<br />
3. to monitor key figures of merit online and in-system.<br />
<br />
To summarize; we need platform-independent, synthesizable, low-overhead AXI4 monitor and throttling unit(s).<br />
<br />
= Project Description =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
<br />
* Familiarize yourself with the AXI4 protocol and the PULP AXI4 implementation<br />
* Create an extensive list of key figures of merit to monitor and compile a list of events required to track them.<br />
* Design and implement such an AXI4 event unit, evaluate it to prove it is working, and synthesize it in an ASIC technology to investigate the area/timing/power overheads.<br />
<br />
'''Stretch Goals'''<br />
Should the above milestones be reached earlier than expected and you are motivated to do further work, we propose the following stretch goals to aim for:<br />
<br />
* Create a performance counter unit allowing you to track your events.<br />
* Create a throttling unit that limits the number of outstanding transfers downstream as well as introduces a varying amount of per-channel delay.</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Towards_a_Technology-independent_and_Synthesizable_AXI4_Performance_Monitoring_and_Throttling_Unit_(1-2S)&diff=8220Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S)2022-10-27T11:56:54Z<p>Smazzola: /* Status: In Progress */</p>
<hr />
<div><!-- Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S) --><br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Filippo Svelto<br />
* Semester: Fall Semester 2022<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Tbenz | Thomas Benz]] [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 5% Literature/architecture review<br />
* 65% RTL implementation<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:In Progress]]<br />
<br />
= Introduction =<br />
<br />
At IIS we design, develop, and maintain the de-facto only usable free and open-source AMBA AXI4 implementation. So far we focused on creating synthesizable interconnect IPs and non-synthesizable verification and monitoring IPs. This approach works well to create the systems (and implement them) and do most of the evaluation in simulation. It no longer works once the need arises:<br />
1. to evaluate (large) systems on FPGAs and/or ASICs.<br />
2. to have a synthesizable fully configurable memory system.<br />
3. to monitor key figures of merit online and in-system.<br />
<br />
To summarize; we need platform-independent, synthesizable, low-overhead AXI4 monitor and throttling unit(s).<br />
<br />
= Project Description =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
* Familiarize yourself with the AXI4 protocol and the PULP AXI4 implementation<br />
* Create an extensive list of key figures of merit to monitor and compile a list of events required to track them.<br />
* Design and implement such an AXI4 event unit, evaluate it to prove it is working, and synthesize it in an ASIC technology to investigate the area/timing/power overheads.<br />
<br />
'''Stretch Goals'''<br />
Should the above milestones be reached earlier than expected and you are motivated to do further work, we propose the following stretch goals to aim for:<br />
* Create a performance counter unit allowing you to track your events.<br />
* Create a throttling unit that limits the number of outstanding transfers downstream as well as introduces a varying amount of per-channel delay.</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Towards_a_Technology-independent_and_Synthesizable_AXI4_Performance_Monitoring_and_Throttling_Unit_(1-2S)&diff=8219Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S)2022-10-27T11:56:40Z<p>Smazzola: Created page with "<!-- Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S) --> = Overview = == Status: In Progress == * Student: Filippo..."</p>
<hr />
<div><!-- Towards a Technology-independent and Synthesizable AXI4 Performance Monitoring and Throttling Unit (1-2S) --><br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Filippo Svelto<br />
* Semester: Fall Semester 2022<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Tbenz | Thomas Benz [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 5% Literature/architecture review<br />
* 65% RTL implementation<br />
* 30% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:In Progress]]<br />
<br />
= Introduction =<br />
<br />
At IIS we design, develop, and maintain the de-facto only usable free and open-source AMBA AXI4 implementation. So far we focused on creating synthesizable interconnect IPs and non-synthesizable verification and monitoring IPs. This approach works well to create the systems (and implement them) and do most of the evaluation in simulation. It no longer works once the need arises:<br />
1. to evaluate (large) systems on FPGAs and/or ASICs.<br />
2. to have a synthesizable fully configurable memory system.<br />
3. to monitor key figures of merit online and in-system.<br />
<br />
To summarize; we need platform-independent, synthesizable, low-overhead AXI4 monitor and throttling unit(s).<br />
<br />
= Project Description =<br />
<br />
The following are the milestones that we expect to achieve throughout the project:<br />
* Familiarize yourself with the AXI4 protocol and the PULP AXI4 implementation<br />
* Create an extensive list of key figures of merit to monitor and compile a list of events required to track them.<br />
* Design and implement such an AXI4 event unit, evaluate it to prove it is working, and synthesize it in an ASIC technology to investigate the area/timing/power overheads.<br />
<br />
'''Stretch Goals'''<br />
Should the above milestones be reached earlier than expected and you are motivated to do further work, we propose the following stretch goals to aim for:<br />
* Create a performance counter unit allowing you to track your events.<br />
* Create a throttling unit that limits the number of outstanding transfers downstream as well as introduces a varying amount of per-channel delay.</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Enabling_Efficient_Systolic_Execution_on_MemPool_(M)&diff=8218Enabling Efficient Systolic Execution on MemPool (M)2022-10-27T11:53:33Z<p>Smazzola: /* Status: Completed */</p>
<hr />
<div><!-- Enabling Efficient Systolic Execution on MemPool (M) --><br />
<br />
= Overview =<br />
<br />
== Status: In Progress ==<br />
<br />
* Student: Vaibhav Krishna<br />
* Semester: Fall Semester 2022<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Mbertuletti | Marco Bertuletti]]: [mailto:mbertuletti@iis.ee.ethz.ch mbertuletti@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 15% Literature/architecture review<br />
* 35% RTL implementation<br />
* 30% Evaluation<br />
* 20% Bare-metal C programming<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
* Experience with C<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Smazzola]]<br />
[[Category:Mbertuletti]]<br />
[[Category:Sriedel]]<br />
[[Category:In progress]]<br />
<br />
= Introduction =<br />
<br />
WIP<br />
<br />
<br />
= Project Description =<br />
<br />
* A<br />
** a<br />
* '''B'''<br />
** b<br />
<br />
= References =<br />
<br />
[[#ref-Riedel2021|&#91;3&#93;]]<br />
<br />
<div id="refs" class="references csl-bib-body"><br />
<br />
<div id="ref-Banshee2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">PULP Team, <span>“<span>Banshee GitHub (https://github.com/pulp-platform/snitch/tree/master/sw/banshee)</span>.”</span> 2021.</span><br />
</div><br />
<br />
<div id="ref-Cavalcante2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“<span>MemPool</span>: A shared-<span>L1</span> memory many-core cluster with a low-latency interconnect,”</span> in ''2021 design, automation, and test in europe conference and exhibition (DATE)'', 2021, pp. 701–706.</span><br />
</div><br />
<br />
</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Enabling_Efficient_Systolic_Execution_on_MemPool_(M)&diff=8217Enabling Efficient Systolic Execution on MemPool (M)2022-10-27T11:53:19Z<p>Smazzola: </p>
<hr />
<div><!-- Enabling Efficient Systolic Execution on MemPool (M) --><br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Student: Vaibhav Krishna<br />
* Semester: Fall Semester 2022<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Mbertuletti | Marco Bertuletti]]: [mailto:mbertuletti@iis.ee.ethz.ch mbertuletti@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 15% Literature/architecture review<br />
* 35% RTL implementation<br />
* 30% Evaluation<br />
* 20% Bare-metal C programming<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
* Experience with C<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Smazzola]]<br />
[[Category:Mbertuletti]]<br />
[[Category:Sriedel]]<br />
[[Category:In progress]]<br />
<br />
= Introduction =<br />
<br />
WIP<br />
<br />
<br />
= Project Description =<br />
<br />
* A<br />
** a<br />
* '''B'''<br />
** b<br />
<br />
= References =<br />
<br />
[[#ref-Riedel2021|&#91;3&#93;]]<br />
<br />
<div id="refs" class="references csl-bib-body"><br />
<br />
<div id="ref-Banshee2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">PULP Team, <span>“<span>Banshee GitHub (https://github.com/pulp-platform/snitch/tree/master/sw/banshee)</span>.”</span> 2021.</span><br />
</div><br />
<br />
<div id="ref-Cavalcante2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“<span>MemPool</span>: A shared-<span>L1</span> memory many-core cluster with a low-latency interconnect,”</span> in ''2021 design, automation, and test in europe conference and exhibition (DATE)'', 2021, pp. 701–706.</span><br />
</div><br />
<br />
</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Counter-based_Fast_Power_Estimation_using_FPGAs_(M/1-3S)&diff=8216Counter-based Fast Power Estimation using FPGAs (M/1-3S)2022-10-27T11:51:12Z<p>Smazzola: </p>
<hr />
<div><!-- Counter-based Fast Power Estimation using FPGAs (M/1-3S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:Archived]]<br />
<br />
= Overview =<br />
<br />
== Status: Archived ==<br />
<br />
* Type: Semester or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Tbenz | Thomas Benz]]: [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The power consumed by a digital circuit can be broken down into two main components: ''leakage power'' and ''dynamic power''.<br />
Dynamic power is proportional to the switching activity of the circuit’s gates. In principle, one could measure dynamic power by observing the switching activity of ''each single net'' in the design. This approach is of course ''completely infeasible'' as it would need hundreds of gates to track the activity of a single gate, increasing the area of the design by ''multiple orders of magnitude'', along with the number of nets to track.<br />
<br />
It has been proven that the activity of a circuit can be closely approximated by randomly selecting a handful of signals to be observed [1].<br />
On the other hand, modern computing systems feature a number of ''performance counters'', i.e. hardware registers tracking carefully selected countable events in the circuit (e.g. cache misses, instruction fetches, floating-point operations, …) with cycle-level accuracy.<br />
Performance counters very much reflect the activity of the individual functional units and therefore the whole system. They are usually employed to profile applications performance and resources utilization at runtime; however, studies show they can be very helpful also when it comes to dynamic power modeling, to support both the circuit design phase [2][3] and runtime energy-aware policies [4][5].<br />
<br />
Custom designs implemented on FPGAs do not usually come with performance counters, hence it is care of the hardware designer to insert ''observation points'' for activity estimation. In this context, an interesting question is whether an approach to hardware counters insertion exist such that activity modeling can be more accurate than random insertion, but less effort than manual performance counters.<br />
The implications are very valuable and pave the way for the development of an automatic power modeling framework for any arbitrary netlist, potentially expanding beyond FPGAs.<br />
<br />
== Project ==<br />
<br />
In this project, you will:<br />
<br />
* devise at least one method to extract the activity of each net in an existing RTL design<br />
* simulate the power consumption of the implemented design using Xilinx Vivado and/or a state-of-the-art power simulation tool<br />
* use statistical methods to correlate the toggling activity of a net to the power consumption of the design finding the ''observation points'' of interest<br />
* create a simple performance counter unit to monitor the activity of your ideal set of ''observation points''<br />
* evaluate your approach.<br />
<br />
Depending on the remaining time and your personal interests, further challenges can be tackled:<br />
<br />
* activity of design units usually highly correlate with the activity of unit's databus, which in turn depends on handshaking signals. Expand your approach to detect handshake signals (e.g. ready, valid) and investigate if knowledge about these signals can improve your approach in terms of accuracy and/or runtime<br />
* implement the design with your performance counter unit on an FPGA and use the data collected to calculate and display the power in real time<br />
* instead of correlating the activity of the nets to the power of the entire design, correlate it to key design units (like memory, FPU, CPU, ...) and create real-time per-unit power estimates.<br />
<br />
== Character ==<br />
<br />
* 10% architecture review<br />
* 20% net activity extraction<br />
* 30% power simulation<br />
* 10% implementation<br />
* 30% evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Interest in power-related topics<br />
* Preferred: Experience with Xilinx Vivado or any other FPGA toolchain<br />
<br />
= References =<br />
<br />
<div> [2] [http://sedici.unlp.edu.ar/handle/10915/90904 A Study of Hardware Performance Counters Selection for Cross Architectural GPU Power Modeling]</div><br />
<div> [3] [https://dl.acm.org/doi/abs/10.1145/3466752.3480063 AccelWattch: A Power Modeling Framework for Modern GPUs]</div><br />
<div> [4] [https://dl.acm.org/doi/abs/10.1145/566726.566736 The benefits of event: driven energy accounting in power-sensitive systems]</div><br />
<div> [5] [https://ieeexplore.ieee.org/abstract/document/845896 A survey of design techniques for system-level dynamic power management]</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=8215User:Smazzola2022-10-27T11:51:01Z<p>Smazzola: /* Projects In Progress */</p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
</DynamicPageList><br />
<br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
'''Completed Projects'''<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
'''Archived'''<br />
<DynamicPageList><br />
category = Archived<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=8214User:Smazzola2022-10-27T11:46:20Z<p>Smazzola: </p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
</DynamicPageList><br />
<br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
'''Completed Projects'''<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
'''Archive'''<br />
<DynamicPageList><br />
category = Archive<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=8213User:Smazzola2022-10-27T11:45:49Z<p>Smazzola: </p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
</DynamicPageList><br />
<br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
'''Completed Projects'''<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
'''Archive'''<br />
<DynamicPageList><br />
category = Archive<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=8212User:Smazzola2022-10-27T11:45:06Z<p>Smazzola: </p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
</DynamicPageList><br />
<br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
'''Archive'''<br />
<DynamicPageList><br />
category = Archive<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=8211User:Smazzola2022-10-27T11:44:50Z<p>Smazzola: </p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
</DynamicPageList><br />
<br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
*Archive*<br />
<DynamicPageList><br />
category = Archive<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=8210User:Smazzola2022-10-27T11:44:36Z<p>Smazzola: </p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
</DynamicPageList><br />
<br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
Archive<br />
<DynamicPageList><br />
category = Archive<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Enabling_Efficient_Systolic_Execution_on_MemPool_(M)&diff=8209Enabling Efficient Systolic Execution on MemPool (M)2022-10-27T11:43:09Z<p>Smazzola: </p>
<hr />
<div><!-- Enabling Efficient Systolic Execution on MemPool (M) --><br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Student: Vaibhav Krishna<br />
* Semester: Fall Semester 2022<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Mbertuletti | Marco Bertuletti]]: [mailto:mbertuletti@iis.ee.ethz.ch mbertuletti@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 15% Literature/architecture review<br />
* 35% RTL implementation<br />
* 30% Evaluation<br />
* 20% Bare-metal C programming<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
* Experience with C<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Smazzola]]<br />
[[Category:Mbertuletti]]<br />
[[Category:Sriedel]]<br />
[[Category:In progress]]<br />
<br />
= Introduction =<br />
<br />
WIP<br />
<br />
<br />
= Project Description =<br />
<br />
* A<br />
** a<br />
* '''Add the complete xpulp set'''<br />
** B<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Beside these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 15 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports<br />
<br />
= References =<br />
<br />
[[#ref-Riedel2021|&#91;3&#93;]]<br />
<br />
<div id="refs" class="references csl-bib-body"><br />
<br />
<div id="ref-Banshee2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">PULP Team, <span>“<span>Banshee GitHub (https://github.com/pulp-platform/snitch/tree/master/sw/banshee)</span>.”</span> 2021.</span><br />
</div><br />
<br />
<div id="ref-Cavalcante2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“<span>MemPool</span>: A shared-<span>L1</span> memory many-core cluster with a low-latency interconnect,”</span> in ''2021 design, automation, and test in europe conference and exhibition (DATE)'', 2021, pp. 701–706.</span><br />
</div><br />
<br />
</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Enabling_Efficient_Systolic_Execution_on_MemPool_(M)&diff=8208Enabling Efficient Systolic Execution on MemPool (M)2022-10-27T11:39:25Z<p>Smazzola: Created page with "<!-- Enabling Efficient Systolic Execution on MemPool (M) --> = Overview = == Status: Completed == * Student: Vaibhav Krishna * Semester: Fall Semester 2022 * Type: Master..."</p>
<hr />
<div><!-- Enabling Efficient Systolic Execution on MemPool (M) --><br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Student: Vaibhav Krishna<br />
* Semester: Fall Semester 2022<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Mbertuletti | Marco Bertuletti]]: [mailto:mbertuletti@iis.ee.ethz.ch mbertuletti@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 15% Literature/architecture review<br />
* 35% RTL implementation<br />
* 30% Evaluation<br />
* 20% Bare-metal C programming<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with RTL design and evaluation<br />
* Experience with C<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2022]]<br />
[[Category:2023]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Smazzola]]<br />
[[Category:Mbertuletti]]<br />
[[Category:Sriedel]]<br />
[[Category:Completed]]<br />
<br />
= Introduction =<br />
<br />
WIP<br />
<br />
<br />
= Project Description =<br />
<br />
* A<br />
** a<br />
* '''Add the complete xpulp set'''<br />
** B<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Beside these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 15 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports<br />
<br />
= References =<br />
<br />
[[#ref-Riedel2021|&#91;3&#93;]]<br />
<br />
<div id="refs" class="references csl-bib-body"><br />
<br />
<div id="ref-Banshee2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">PULP Team, <span>“<span>Banshee GitHub (https://github.com/pulp-platform/snitch/tree/master/sw/banshee)</span>.”</span> 2021.</span><br />
</div><br />
<br />
<div id="ref-Cavalcante2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“<span>MemPool</span>: A shared-<span>L1</span> memory many-core cluster with a low-latency interconnect,”</span> in ''2021 design, automation, and test in europe conference and exhibition (DATE)'', 2021, pp. 701–706.</span><br />
</div><br />
<br />
</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Implementing_DSP_Instructions_in_Banshee_(1S)&diff=8207Implementing DSP Instructions in Banshee (1S)2022-10-27T11:31:37Z<p>Smazzola: </p>
<hr />
<div><!-- Implementing DSP Instructions in Banshee (M/1S) --><br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Student: Lorena Oswald<br />
* Type: Bachelor Thesis<br />
* Semester: Spring Semester 2022<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 60% Rust programming<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience or interest in learning Rust<br />
* Experience with C<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Sriedel]]<br />
[[Category:Smazzola]]<br />
[[Category:Paulsc]]<br />
[[Category:Georg]]<br />
[[Category:Completed]]<br />
<br />
= Introduction =<br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of manycore systems. Those systems integrate many small cores (hundreds, thousands) that work independently to execute highly-parallelizable algorithms. Exploring new architectures and writing software for manycore systems is very challenging and requires the support of good simulation tools at various levels of abstraction.<br />
<br />
At ETH, we have developed ''Banshee'', an LLVM-based binary translator capable of simulating our manycore architectures [[#ref-Banshee2021|&#91;1&#93;]]. It is written in Rust, making it easy to extend, and thanks to its static binary translation, it reaches a performance of up to 72 GIPS, outperforming RTL simulation by several orders of magnitude.<br />
<br />
One of the manycore systems developed at ETH is ''MemPool'' [[#ref-Cavalcante2020|&#91;2&#93;]], [[#ref-Riedel2021|&#91;3&#93;]]. It boasts 256 lightweight 32-bit Snitch cores [[#ref-Zaruba2020|&#91;4&#93;]]. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA [[#ref-Waterman2019|&#91;5&#93;]]. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads and easy to program. To improve MemPool’s performance, we have recently added a custom ISA extension with specialized DSP instructions such as multiply-accumulate or SIMD instructions [[#ref-Mazzola2021|&#91;6&#93;]]. The instructions are a subset of the ''xpulp'' instruction set [[#ref-Xpulp2021|&#91;7&#93;]].<br />
<br />
To allow us to also use the xpulp instructions in Banshee, the project’s goal is to add the xpulp instruction set extension to Banshee. In a first step, the focus lies on adding the instructions currently supported by Snitch, followed by the rest of the xpulp set. While adding the instructions to Banshee, we also want to evaluate their impact on signal-processing kernels while comparing Banshee’s accuracy with the RTL model.<br />
<br />
= Project Description =<br />
<br />
* '''Implement MemPool’s instructions in Banshee'''<br />
** Analyze the subsets necessary to support the full extension.<br />
** Add them to Banshee by emitting the corresponding LLVM IR or writing a high-level description of the functionality in Rust.<br />
** Verify your implementation with MemPool’s test infrastructure.<br />
* '''Add the complete xpulp set'''<br />
** Decide on the order of the most useful subsets and add as many as time allows.<br />
** Verify your implementation by extending MemPool’s test infrastructure.<br />
* '''Evaluate the performance gain those instructions bring and the accuracy of Banshee.'''<br />
** Use existing DSP kernels and/or implement your own ones to evaluate the benefit your instructions bring.<br />
** Compare the estimated speedup with the real performance gain observed in RTL.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Beside these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 15 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports<br />
<br />
= References =<br />
<br />
<div id="refs" class="references csl-bib-body"><br />
<div id="ref-Banshee2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">PULP Team, <span>“<span>Banshee GitHub (https://github.com/pulp-platform/snitch/tree/master/sw/banshee)</span>.”</span> 2021.</span><br />
</div><br />
<div id="ref-Cavalcante2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“<span>MemPool</span>: A shared-<span>L1</span> memory many-core cluster with a low-latency interconnect,”</span> in ''2021 design, automation, and test in europe conference and exhibition (DATE)'', 2021, pp. 701–706.</span><br />
</div><br />
<div id="ref-Riedel2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;3&#93; </span><span class="csl-right-inline">S. Riedel and M. Cavalcante, <span>“<span>MemPool GitHub</span>.”</span> 2021.</span><br />
</div><br />
<div id="ref-Zaruba2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini, <span>“<span class="nocase">Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads</span>,”</span> ''IEEE TRANSACTIONS ON COMPUTERS'', pp. 1–1, Feb. 2020.</span><br />
</div><br />
<div id="ref-Waterman2019" class="csl-entry"><br />
<span class="csl-left-margin">&#91;5&#93; </span><span class="csl-right-inline">A. Waterman and K. Asanović, <span>“<span>The RISC-V Instruction Set Manual Volume I: Unprivileged ISA - Document Version 20191213</span>,”</span> RISC-V Foundation, 2019.</span><br />
</div><br />
<div id="ref-Mazzola2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;6&#93; </span><span class="csl-right-inline">S. Mazzola, <span>“<span class="nocase">ISA extensions in the Snitch Processor for Signal Processing</span>,”</span> Apr. 2021.</span><br />
</div><br />
<div id="ref-Xpulp2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;7&#93; </span><span class="csl-right-inline">OpenHW Group, <span>“<span class="nocase">cv32e40p User Manual</span>.”</span> 2021.</span><br />
</div><br />
</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=8206User:Smazzola2022-10-27T11:28:57Z<p>Smazzola: </p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
</DynamicPageList><br />
<br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
Archive<br />
<DynamicPageList><br />
category = Archive<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Counter-based_Fast_Power_Estimation_using_FPGAs_(M/1-3S)&diff=8205Counter-based Fast Power Estimation using FPGAs (M/1-3S)2022-10-27T11:28:12Z<p>Smazzola: </p>
<hr />
<div><!-- Counter-based Fast Power Estimation using FPGAs (M/1-3S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:Archive]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Tbenz | Thomas Benz]]: [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The power consumed by a digital circuit can be broken down into two main components: ''leakage power'' and ''dynamic power''.<br />
Dynamic power is proportional to the switching activity of the circuit’s gates. In principle, one could measure dynamic power by observing the switching activity of ''each single net'' in the design. This approach is of course ''completely infeasible'' as it would need hundreds of gates to track the activity of a single gate, increasing the area of the design by ''multiple orders of magnitude'', along with the number of nets to track.<br />
<br />
It has been proven that the activity of a circuit can be closely approximated by randomly selecting a handful of signals to be observed [1].<br />
On the other hand, modern computing systems feature a number of ''performance counters'', i.e. hardware registers tracking carefully selected countable events in the circuit (e.g. cache misses, instruction fetches, floating-point operations, …) with cycle-level accuracy.<br />
Performance counters very much reflect the activity of the individual functional units and therefore the whole system. They are usually employed to profile applications performance and resources utilization at runtime; however, studies show they can be very helpful also when it comes to dynamic power modeling, to support both the circuit design phase [2][3] and runtime energy-aware policies [4][5].<br />
<br />
Custom designs implemented on FPGAs do not usually come with performance counters, hence it is care of the hardware designer to insert ''observation points'' for activity estimation. In this context, an interesting question is whether an approach to hardware counters insertion exist such that activity modeling can be more accurate than random insertion, but less effort than manual performance counters.<br />
The implications are very valuable and pave the way for the development of an automatic power modeling framework for any arbitrary netlist, potentially expanding beyond FPGAs.<br />
<br />
== Project ==<br />
<br />
In this project, you will:<br />
<br />
* devise at least one method to extract the activity of each net in an existing RTL design<br />
* simulate the power consumption of the implemented design using Xilinx Vivado and/or a state-of-the-art power simulation tool<br />
* use statistical methods to correlate the toggling activity of a net to the power consumption of the design finding the ''observation points'' of interest<br />
* create a simple performance counter unit to monitor the activity of your ideal set of ''observation points''<br />
* evaluate your approach.<br />
<br />
Depending on the remaining time and your personal interests, further challenges can be tackled:<br />
<br />
* activity of design units usually highly correlate with the activity of unit's databus, which in turn depends on handshaking signals. Expand your approach to detect handshake signals (e.g. ready, valid) and investigate if knowledge about these signals can improve your approach in terms of accuracy and/or runtime<br />
* implement the design with your performance counter unit on an FPGA and use the data collected to calculate and display the power in real time<br />
* instead of correlating the activity of the nets to the power of the entire design, correlate it to key design units (like memory, FPU, CPU, ...) and create real-time per-unit power estimates.<br />
<br />
== Character ==<br />
<br />
* 10% architecture review<br />
* 20% net activity extraction<br />
* 30% power simulation<br />
* 10% implementation<br />
* 30% evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Interest in power-related topics<br />
* Preferred: Experience with Xilinx Vivado or any other FPGA toolchain<br />
<br />
= References =<br />
<br />
<div> [2] [http://sedici.unlp.edu.ar/handle/10915/90904 A Study of Hardware Performance Counters Selection for Cross Architectural GPU Power Modeling]</div><br />
<div> [3] [https://dl.acm.org/doi/abs/10.1145/3466752.3480063 AccelWattch: A Power Modeling Framework for Modern GPUs]</div><br />
<div> [4] [https://dl.acm.org/doi/abs/10.1145/566726.566736 The benefits of event: driven energy accounting in power-sensitive systems]</div><br />
<div> [5] [https://ieeexplore.ieee.org/abstract/document/845896 A survey of design techniques for system-level dynamic power management]</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=8071User:Smazzola2022-09-15T12:21:22Z<p>Smazzola: </p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
</DynamicPageList><br />
<br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Implementing_DSP_Instructions_in_Banshee_(1S)&diff=8070Implementing DSP Instructions in Banshee (1S)2022-09-15T12:20:24Z<p>Smazzola: </p>
<hr />
<div><!-- Implementing DSP Instructions in Banshee (M/1S) --><br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Student: Lorena Oswald<br />
* Type: Bachelor Thesis<br />
* Semester: Spring Semester 2022<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 60% Rust programming<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience or interest in learning Rust<br />
* Experience with C<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Sriedel]]<br />
[[Category:Smazzola]]<br />
[[Category:Paulsc]]<br />
[[Category:Georg]]<br />
[[Category:Completed]]<br />
<br />
= Introduction =<br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of manycore systems. Those systems integrate many small cores (hundreds, thousands) that work independently to execute highly-parallelizable algorithms. Exploring new architectures and writing software for manycore systems is very challenging and requires the support of good simulation tools at various levels of abstraction.<br />
<br />
At ETH, we have developed ''Banshee'', an LLVM-based binary translator capable of simulating our manycore architectures [[#ref-Banshee2021|&#91;1&#93;]]. It is written in Rust, making it easy to extend, and thanks to its static binary translation, it reaches a performance of up to 72 GIPS, outperforming RTL simulation by several orders of magnitude.<br />
<br />
One of the manycore systems developed at ETH is ''MemPool'' [[#ref-Cavalcante2020|&#91;2&#93;]], [[#ref-Riedel2021|&#91;3&#93;]]. It boasts 256 lightweight 32-bit Snitch cores [[#ref-Zaruba2020|&#91;4&#93;]]. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA [[#ref-Waterman2019|&#91;5&#93;]]. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads and easy to program. To improve MemPool’s performance, we have recently added a custom ISA extension with specialized DSP instructions such as multiply-accumulate or SIMD instructions [[#ref-Mazzola2021|&#91;6&#93;]]. The instructions are a subset of the ''xpulp'' instruction set [[#ref-Xpulp2021|&#91;7&#93;]].<br />
<br />
To allow us to also use the xpulp instructions in Banshee, the project’s goal is to add the xpulp instruction set extension to Banshee. In a first step, the focus lies on adding the instructions currently supported by Snitch, followed by the rest of the xpulp set. While adding the instructions to Banshee, we also want to evaluate their impact on signal-processing kernels while comparing Banshee’s accuracy with the RTL model.<br />
<br />
= Project Description =<br />
<br />
* '''Implement MemPool’s instructions in Banshee'''<br />
** Analyze the subsets necessary to support the full extension.<br />
** Add them to Banshee by emitting the corresponding LLVM IR or writing a high-level description of the functionality in Rust.<br />
** Verify your implementation with MemPool’s test infrastructure.<br />
* '''Add the complete xpulp set'''<br />
** Decide on the order of the most useful subsets and add as many as time allows.<br />
** Verify your implementation by extending MemPool’s test infrastructure.<br />
* '''Evaluate the performance gain those instructions bring and the accuracy of Banshee.'''<br />
** Use existing DSP kernels and/or implement your own ones to evaluate the benefit your instructions bring.<br />
** Compare the estimated speedup with the real performance gain observed in RTL.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Beside these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 15 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports<br />
<br />
= References =<br />
<br />
<div id="refs" class="references csl-bib-body"><br />
<div id="ref-Banshee2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">PULP Team, <span>“<span>Banshee GitHub (https://github.com/pulp-platform/snitch/tree/master/sw/banshee)</span>.”</span> 2021.</span><br />
</div><br />
<div id="ref-Cavalcante2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“<span>MemPool</span>: A shared-<span>L1</span> memory many-core cluster with a low-latency interconnect,”</span> in ''2021 design, automation, and test in europe conference and exhibition (DATE)'', 2021, pp. 701–706.</span><br />
</div><br />
<div id="ref-Riedel2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;3&#93; </span><span class="csl-right-inline">S. Riedel and M. Cavalcante, <span>“<span>MemPool GitHub</span>.”</span> 2021.</span><br />
</div><br />
<div id="ref-Zaruba2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini, <span>“<span class="nocase">Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads</span>,”</span> ''IEEE TRANSACTIONS ON COMPUTERS'', pp. 1–1, Feb. 2020.</span><br />
</div><br />
<div id="ref-Waterman2019" class="csl-entry"><br />
<span class="csl-left-margin">&#91;5&#93; </span><span class="csl-right-inline">A. Waterman and K. Asanović, <span>“<span>The RISC-V Instruction Set Manual Volume I: Unprivileged ISA - Document Version 20191213</span>,”</span> RISC-V Foundation, 2019.</span><br />
</div><br />
<div id="ref-Mazzola2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;6&#93; </span><span class="csl-right-inline">S. Mazzola, <span>“<span class="nocase">ISA extensions in the Snitch Processor for Signal Processing</span>,”</span> Apr. 2021.</span><br />
</div><br />
<div id="ref-Xpulp2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;7&#93; </span><span class="csl-right-inline">OpenHW Group, <span>“<span class="nocase">cv32e40p User Manual</span>.”</span> 2021.</span><br />
</div><br />
</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M/1-2S)&diff=8069Streaming Integer Extensions for Snitch (M/1-2S)2022-09-15T12:19:59Z<p>Smazzola: </p>
<hr />
<div><!-- Streaming Integer Extensions for Snitch (M/1-2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Sriedel]]<br />
[[Category:Completed]]<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Student: Chen Sun<br />
* Type: Master Thesis<br />
* Semester: Spring Semester 2022<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low- and mixed-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer unit utilization as in the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
A main goal of this thesis is to create a Snitch-based system which is suitable for the type of embedded/edge applications targeted by the RI5CY core. Comparing the fundamentally different approaches to maximizing (integer) compute throughput in terms of performance, flexibility and energy efficiency will yield valuable insights and influence the direction of future embedded MCU development at IIS.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bits).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M) ISA extensions in the Snitch Processor for Signal Processing (M)] (Previous Master thesis project)<br />
<br />
[4] [https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv Snitch IPU accelerator in the MemPool many-core system] (GitHub repository)<br />
<br />
[5] [https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions/#core-v-instruction-set-extensions CORE-V Instruction Set Extensions]<br />
<br />
[6] [https://github.com/riscv/riscv-bitmanip RISC-V Bit Manipulation draft specification] (GitHub repository)<br />
<br />
[7] [https://ieeexplore.ieee.org/abstract/document/9406333 XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes]<br />
<br />
[8] [https://github.com/openhwgroup/cv32e40p OpenHW Group CORE-V CV32E40P RISC-V IP] (GitHub repository)</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Implementing_DSP_Instructions_in_Banshee_(1S)&diff=8068Implementing DSP Instructions in Banshee (1S)2022-09-15T12:19:22Z<p>Smazzola: /* Status: Completed */</p>
<hr />
<div><!-- Implementing DSP Instructions in Banshee (M/1S) --><br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Student: Lorena Oswald<br />
* Type: Bachelor Thesis<br />
* Semester: Spring Semester 2022<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 60% Rust programming<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience or interest in learning Rust<br />
* Experience with C<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Sriedel]]<br />
[[Category:Smazzola]]<br />
[[Category:Paulsc]]<br />
[[Category:Georg]]<br />
[[Category:In progress]]<br />
<br />
= Introduction =<br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of manycore systems. Those systems integrate many small cores (hundreds, thousands) that work independently to execute highly-parallelizable algorithms. Exploring new architectures and writing software for manycore systems is very challenging and requires the support of good simulation tools at various levels of abstraction.<br />
<br />
At ETH, we have developed ''Banshee'', an LLVM-based binary translator capable of simulating our manycore architectures [[#ref-Banshee2021|&#91;1&#93;]]. It is written in Rust, making it easy to extend, and thanks to its static binary translation, it reaches a performance of up to 72 GIPS, outperforming RTL simulation by several orders of magnitude.<br />
<br />
One of the manycore systems developed at ETH is ''MemPool'' [[#ref-Cavalcante2020|&#91;2&#93;]], [[#ref-Riedel2021|&#91;3&#93;]]. It boasts 256 lightweight 32-bit Snitch cores [[#ref-Zaruba2020|&#91;4&#93;]]. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA [[#ref-Waterman2019|&#91;5&#93;]]. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads and easy to program. To improve MemPool’s performance, we have recently added a custom ISA extension with specialized DSP instructions such as multiply-accumulate or SIMD instructions [[#ref-Mazzola2021|&#91;6&#93;]]. The instructions are a subset of the ''xpulp'' instruction set [[#ref-Xpulp2021|&#91;7&#93;]].<br />
<br />
To allow us to also use the xpulp instructions in Banshee, the project’s goal is to add the xpulp instruction set extension to Banshee. In a first step, the focus lies on adding the instructions currently supported by Snitch, followed by the rest of the xpulp set. While adding the instructions to Banshee, we also want to evaluate their impact on signal-processing kernels while comparing Banshee’s accuracy with the RTL model.<br />
<br />
= Project Description =<br />
<br />
* '''Implement MemPool’s instructions in Banshee'''<br />
** Analyze the subsets necessary to support the full extension.<br />
** Add them to Banshee by emitting the corresponding LLVM IR or writing a high-level description of the functionality in Rust.<br />
** Verify your implementation with MemPool’s test infrastructure.<br />
* '''Add the complete xpulp set'''<br />
** Decide on the order of the most useful subsets and add as many as time allows.<br />
** Verify your implementation by extending MemPool’s test infrastructure.<br />
* '''Evaluate the performance gain those instructions bring and the accuracy of Banshee.'''<br />
** Use existing DSP kernels and/or implement your own ones to evaluate the benefit your instructions bring.<br />
** Compare the estimated speedup with the real performance gain observed in RTL.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Beside these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 15 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports<br />
<br />
= References =<br />
<br />
<div id="refs" class="references csl-bib-body"><br />
<div id="ref-Banshee2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">PULP Team, <span>“<span>Banshee GitHub (https://github.com/pulp-platform/snitch/tree/master/sw/banshee)</span>.”</span> 2021.</span><br />
</div><br />
<div id="ref-Cavalcante2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“<span>MemPool</span>: A shared-<span>L1</span> memory many-core cluster with a low-latency interconnect,”</span> in ''2021 design, automation, and test in europe conference and exhibition (DATE)'', 2021, pp. 701–706.</span><br />
</div><br />
<div id="ref-Riedel2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;3&#93; </span><span class="csl-right-inline">S. Riedel and M. Cavalcante, <span>“<span>MemPool GitHub</span>.”</span> 2021.</span><br />
</div><br />
<div id="ref-Zaruba2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini, <span>“<span class="nocase">Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads</span>,”</span> ''IEEE TRANSACTIONS ON COMPUTERS'', pp. 1–1, Feb. 2020.</span><br />
</div><br />
<div id="ref-Waterman2019" class="csl-entry"><br />
<span class="csl-left-margin">&#91;5&#93; </span><span class="csl-right-inline">A. Waterman and K. Asanović, <span>“<span>The RISC-V Instruction Set Manual Volume I: Unprivileged ISA - Document Version 20191213</span>,”</span> RISC-V Foundation, 2019.</span><br />
</div><br />
<div id="ref-Mazzola2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;6&#93; </span><span class="csl-right-inline">S. Mazzola, <span>“<span class="nocase">ISA extensions in the Snitch Processor for Signal Processing</span>,”</span> Apr. 2021.</span><br />
</div><br />
<div id="ref-Xpulp2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;7&#93; </span><span class="csl-right-inline">OpenHW Group, <span>“<span class="nocase">cv32e40p User Manual</span>.”</span> 2021.</span><br />
</div><br />
</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M/1-2S)&diff=8067Streaming Integer Extensions for Snitch (M/1-2S)2022-09-15T12:17:44Z<p>Smazzola: /* Status: Completed */</p>
<hr />
<div><!-- Streaming Integer Extensions for Snitch (M/1-2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Sriedel]]<br />
[[Category:In progress]]<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Student: Chen Sun<br />
* Type: Master Thesis<br />
* Semester: Spring Semester 2022<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low- and mixed-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer unit utilization as in the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
A main goal of this thesis is to create a Snitch-based system which is suitable for the type of embedded/edge applications targeted by the RI5CY core. Comparing the fundamentally different approaches to maximizing (integer) compute throughput in terms of performance, flexibility and energy efficiency will yield valuable insights and influence the direction of future embedded MCU development at IIS.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bits).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M) ISA extensions in the Snitch Processor for Signal Processing (M)] (Previous Master thesis project)<br />
<br />
[4] [https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv Snitch IPU accelerator in the MemPool many-core system] (GitHub repository)<br />
<br />
[5] [https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions/#core-v-instruction-set-extensions CORE-V Instruction Set Extensions]<br />
<br />
[6] [https://github.com/riscv/riscv-bitmanip RISC-V Bit Manipulation draft specification] (GitHub repository)<br />
<br />
[7] [https://ieeexplore.ieee.org/abstract/document/9406333 XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes]<br />
<br />
[8] [https://github.com/openhwgroup/cv32e40p OpenHW Group CORE-V CV32E40P RISC-V IP] (GitHub repository)</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M/1-2S)&diff=8066Streaming Integer Extensions for Snitch (M/1-2S)2022-09-15T12:16:14Z<p>Smazzola: </p>
<hr />
<div><!-- Streaming Integer Extensions for Snitch (M/1-2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Sriedel]]<br />
[[Category:In progress]]<br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Type: Master or Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low- and mixed-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer unit utilization as in the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
A main goal of this thesis is to create a Snitch-based system which is suitable for the type of embedded/edge applications targeted by the RI5CY core. Comparing the fundamentally different approaches to maximizing (integer) compute throughput in terms of performance, flexibility and energy efficiency will yield valuable insights and influence the direction of future embedded MCU development at IIS.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bits).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M) ISA extensions in the Snitch Processor for Signal Processing (M)] (Previous Master thesis project)<br />
<br />
[4] [https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv Snitch IPU accelerator in the MemPool many-core system] (GitHub repository)<br />
<br />
[5] [https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions/#core-v-instruction-set-extensions CORE-V Instruction Set Extensions]<br />
<br />
[6] [https://github.com/riscv/riscv-bitmanip RISC-V Bit Manipulation draft specification] (GitHub repository)<br />
<br />
[7] [https://ieeexplore.ieee.org/abstract/document/9406333 XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes]<br />
<br />
[8] [https://github.com/openhwgroup/cv32e40p OpenHW Group CORE-V CV32E40P RISC-V IP] (GitHub repository)</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Implementing_DSP_Instructions_in_Banshee_(1S)&diff=8065Implementing DSP Instructions in Banshee (1S)2022-09-15T12:15:56Z<p>Smazzola: </p>
<hr />
<div><!-- Implementing DSP Instructions in Banshee (M/1S) --><br />
<br />
= Overview =<br />
<br />
== Status: Completed ==<br />
<br />
* Type: Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
<br />
== Character ==<br />
<br />
* 60% Rust programming<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience or interest in learning Rust<br />
* Experience with C<br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:2022]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Sriedel]]<br />
[[Category:Smazzola]]<br />
[[Category:Paulsc]]<br />
[[Category:Georg]]<br />
[[Category:In progress]]<br />
<br />
= Introduction =<br />
<br />
In a quest for high-performance computing systems, few architectural models retain the flexibility of manycore systems. Those systems integrate many small cores (hundreds, thousands) that work independently to execute highly-parallelizable algorithms. Exploring new architectures and writing software for manycore systems is very challenging and requires the support of good simulation tools at various levels of abstraction.<br />
<br />
At ETH, we have developed ''Banshee'', an LLVM-based binary translator capable of simulating our manycore architectures [[#ref-Banshee2021|&#91;1&#93;]]. It is written in Rust, making it easy to extend, and thanks to its static binary translation, it reaches a performance of up to 72 GIPS, outperforming RTL simulation by several orders of magnitude.<br />
<br />
One of the manycore systems developed at ETH is ''MemPool'' [[#ref-Cavalcante2020|&#91;2&#93;]], [[#ref-Riedel2021|&#91;3&#93;]]. It boasts 256 lightweight 32-bit Snitch cores [[#ref-Zaruba2020|&#91;4&#93;]]. They implement the RISC-V instruction set architecture (ISA), a modular and open ISA [[#ref-Waterman2019|&#91;5&#93;]]. Despite its size, MemPool manages to give all 256 cores low-latency access to the shared L1 memory, with a zero-load latency of at most five cycles. Therefore, all cores can efficiently communicate, making MemPool suitable for various workloads and easy to program. To improve MemPool’s performance, we have recently added a custom ISA extension with specialized DSP instructions such as multiply-accumulate or SIMD instructions [[#ref-Mazzola2021|&#91;6&#93;]]. The instructions are a subset of the ''xpulp'' instruction set [[#ref-Xpulp2021|&#91;7&#93;]].<br />
<br />
To allow us to also use the xpulp instructions in Banshee, the project’s goal is to add the xpulp instruction set extension to Banshee. In a first step, the focus lies on adding the instructions currently supported by Snitch, followed by the rest of the xpulp set. While adding the instructions to Banshee, we also want to evaluate their impact on signal-processing kernels while comparing Banshee’s accuracy with the RTL model.<br />
<br />
= Project Description =<br />
<br />
* '''Implement MemPool’s instructions in Banshee'''<br />
** Analyze the subsets necessary to support the full extension.<br />
** Add them to Banshee by emitting the corresponding LLVM IR or writing a high-level description of the functionality in Rust.<br />
** Verify your implementation with MemPool’s test infrastructure.<br />
* '''Add the complete xpulp set'''<br />
** Decide on the order of the most useful subsets and add as many as time allows.<br />
** Verify your implementation by extending MemPool’s test infrastructure.<br />
* '''Evaluate the performance gain those instructions bring and the accuracy of Banshee.'''<br />
** Use existing DSP kernels and/or implement your own ones to evaluate the benefit your instructions bring.<br />
** Compare the estimated speedup with the real performance gain observed in RTL.<br />
<br />
= Project Realization =<br />
<br />
== Meetings ==<br />
<br />
Weekly meetings will be held between the student and the assistants. The exact time and location of these meetings will be determined within the first week of the project in order to fit the student’s and the assistants’ schedule. These meetings will be used to evaluate the status and progress of the project. Beside these regular meetings, additional meetings can be organized to address urgent issues as well.<br />
<br />
== Weekly Reports ==<br />
<br />
Semester Thesis: The student is advised, but not required, to a write a weekly report at the end of each week and to send it to his advisors. The idea of the weekly report is to briefly summarize the work, progress and any findings made during the week, to plan the actions for the next week, and to bring up open questions and points. The weekly report is also an important means for the student to get a goal-oriented attitude to work.<br />
<br />
== Coding Guidelines ==<br />
<br />
==== HDL Code Style ====<br />
<br />
Adapting a consistent code style is one of the most important steps in order to make your code easy to understand. If signals, processes, and modules are always named consistently, any inconsistency can be detected more easily. Moreover, if a design group shares the same naming and formatting conventions, all members immediately ''feel at home'' with each other’s code. At IIS, we use lowRISC’s style guide for SystemVerilog HDL: https://github.com/lowRISC/style-guides/.<br />
<br />
==== Software Code Style ====<br />
<br />
We generally suggest that you use style guides or code formatters provided by the language’s developers or community. For example, we recommend LLVM’s or Google’s code styles with <code>clang-format</code> for C/C++, PEP-8 and <code>pylint</code> for Python, and the official style guide with <code>rustfmt</code> for Rust.<br />
<br />
==== Version Control ====<br />
<br />
Even in the context of a student project, keeping a precise history of changes is ''essential'' to a maintainable codebase. You may also need to collaborate with others, adopt their changes to existing code, or work on different versions of your code concurrently. For all of these purposes, we heavily use ''Git'' as a version control system at IIS. If you have no previous experience with Git, we ''strongly'' advise you to familiarize yourself with the basic Git workflow before you start your project.<br />
<br />
== Report ==<br />
<br />
Documentation is an important and often overlooked aspect of engineering. A final report has to be completed within this project.<br />
<br />
The common language of engineering is de facto English. Therefore, the final report of the work is preferred to be written in English.<br />
<br />
Any form of word processing software is allowed for writing the reports, nevertheless the use of LaTeX with Inkscape or any other vector drawing software (for block diagrams) is strongly encouraged by the IIS staff.<br />
<br />
If you write the report in LaTeX, we offer an instructive, ready-to-use template, which can be forked from the Git repository at https://iis-git.ee.ethz.ch/akurth/iisreport.<br />
<br />
==== Final Report ====<br />
<br />
The final report has to be presented at the end of the project and a digital copy needs to be handed in and remain property of the IIS. Note that this task description is part of your report and has to be attached to your final report.<br />
<br />
== Presentation ==<br />
<br />
There will be a presentation 15 min presentation and 5 min Q&amp;A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work.<br />
<br />
= Deliverables =<br />
<br />
In order to complete the project successfully, the following deliverables have to be submitted at the end of the work:<br />
<br />
* Final report incl. presentation slides<br />
* Source code and documentation for all developed software and hardware<br />
* Testsuites (software) and testbenches (hardware)<br />
* Synthesis and implementation scripts, results, and reports<br />
<br />
= References =<br />
<br />
<div id="refs" class="references csl-bib-body"><br />
<div id="ref-Banshee2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;1&#93; </span><span class="csl-right-inline">PULP Team, <span>“<span>Banshee GitHub (https://github.com/pulp-platform/snitch/tree/master/sw/banshee)</span>.”</span> 2021.</span><br />
</div><br />
<div id="ref-Cavalcante2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;2&#93; </span><span class="csl-right-inline">M. Cavalcante, S. Riedel, A. Pullini, and L. Benini, <span>“<span>MemPool</span>: A shared-<span>L1</span> memory many-core cluster with a low-latency interconnect,”</span> in ''2021 design, automation, and test in europe conference and exhibition (DATE)'', 2021, pp. 701–706.</span><br />
</div><br />
<div id="ref-Riedel2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;3&#93; </span><span class="csl-right-inline">S. Riedel and M. Cavalcante, <span>“<span>MemPool GitHub</span>.”</span> 2021.</span><br />
</div><br />
<div id="ref-Zaruba2020" class="csl-entry"><br />
<span class="csl-left-margin">&#91;4&#93; </span><span class="csl-right-inline">F. Zaruba, F. Schuiki, T. Hoefler, and L. Benini, <span>“<span class="nocase">Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads</span>,”</span> ''IEEE TRANSACTIONS ON COMPUTERS'', pp. 1–1, Feb. 2020.</span><br />
</div><br />
<div id="ref-Waterman2019" class="csl-entry"><br />
<span class="csl-left-margin">&#91;5&#93; </span><span class="csl-right-inline">A. Waterman and K. Asanović, <span>“<span>The RISC-V Instruction Set Manual Volume I: Unprivileged ISA - Document Version 20191213</span>,”</span> RISC-V Foundation, 2019.</span><br />
</div><br />
<div id="ref-Mazzola2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;6&#93; </span><span class="csl-right-inline">S. Mazzola, <span>“<span class="nocase">ISA extensions in the Snitch Processor for Signal Processing</span>,”</span> Apr. 2021.</span><br />
</div><br />
<div id="ref-Xpulp2021" class="csl-entry"><br />
<span class="csl-left-margin">&#91;7&#93; </span><span class="csl-right-inline">OpenHW Group, <span>“<span class="nocase">cv32e40p User Manual</span>.”</span> 2021.</span><br />
</div><br />
</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=7751User:Smazzola2022-04-12T07:56:36Z<p>Smazzola: </p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M/1-2S)&diff=7498Streaming Integer Extensions for Snitch (M/1-2S)2022-01-26T10:34:03Z<p>Smazzola: /* Status: Reserved */</p>
<hr />
<div><!-- Streaming Integer Extensions for Snitch (M/1-2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Sriedel]]<br />
[[Category:Reserved]]<br />
<br />
= Overview =<br />
<br />
== Status: Reserved ==<br />
<br />
* Type: Master or Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low- and mixed-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer unit utilization as in the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
A main goal of this thesis is to create a Snitch-based system which is suitable for the type of embedded/edge applications targeted by the RI5CY core. Comparing the fundamentally different approaches to maximizing (integer) compute throughput in terms of performance, flexibility and energy efficiency will yield valuable insights and influence the direction of future embedded MCU development at IIS.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bits).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M) ISA extensions in the Snitch Processor for Signal Processing (M)] (Previous Master thesis project)<br />
<br />
[4] [https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv Snitch IPU accelerator in the MemPool many-core system] (GitHub repository)<br />
<br />
[5] [https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions/#core-v-instruction-set-extensions CORE-V Instruction Set Extensions]<br />
<br />
[6] [https://github.com/riscv/riscv-bitmanip RISC-V Bit Manipulation draft specification] (GitHub repository)<br />
<br />
[7] [https://ieeexplore.ieee.org/abstract/document/9406333 XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes]<br />
<br />
[8] [https://github.com/openhwgroup/cv32e40p OpenHW Group CORE-V CV32E40P RISC-V IP] (GitHub repository)</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=User:Smazzola&diff=7497User:Smazzola2022-01-26T10:33:31Z<p>Smazzola: </p>
<hr />
<div>= Sergio Mazzola =<br />
<br />
[[File:Smazzola_face_1to1.png|thumb|200px|]]<br />
<br />
==Contact==<br />
<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
<br />
==Projects==<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<DynamicPageList><br />
category = Reserved<br />
category = Smazzola<br />
supresserrors = true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Smazzola<br />
suppresserrors=true<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7272Streaming Integer Extensions for Snitch (M)2021-11-20T14:49:41Z<p>Smazzola: Redirected page to Streaming Integer Extensions for Snitch (M/1-2S)</p>
<hr />
<div>#Redirect [[Streaming Integer Extensions for Snitch (M/1-2S)]]</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7271Streaming Integer Extensions for Snitch (M)2021-11-20T14:48:16Z<p>Smazzola: Redirected page to Https://iis-projects.ee.ethz.ch/index.php?title=Streaming Integer Extensions for Snitch (M/1-2S)</p>
<hr />
<div>#Redirect [[https://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M/1-2S)]]</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7270Streaming Integer Extensions for Snitch (M)2021-11-20T14:47:58Z<p>Smazzola: </p>
<hr />
<div>#Redirect https://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M/1-2S)</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7269Streaming Integer Extensions for Snitch (M)2021-11-20T14:47:43Z<p>Smazzola: Replaced content with "#Redirect [https://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M/1-2S)]"</p>
<hr />
<div>#Redirect [https://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M/1-2S)]</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Category:Paulsc&diff=7268Category:Paulsc2021-11-20T14:47:24Z<p>Smazzola: Undo revision 7267 by Smazzola (talk)</p>
<hr />
<div>#Redirect [[User:Paulsc]]</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Category:Paulsc&diff=7267Category:Paulsc2021-11-20T14:47:03Z<p>Smazzola: Redirected page to Https://iis-projects.ee.ethz.ch/index.php?title=Streaming Integer Extensions for Snitch (M/1-2S)</p>
<hr />
<div>#Redirect [[https://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M/1-2S)]]</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M/1-2S)&diff=7266Streaming Integer Extensions for Snitch (M/1-2S)2021-11-20T14:45:43Z<p>Smazzola: Created page with "<!-- Streaming Integer Extensions for Snitch (M/1-2S) --> Category:Digital Category:High Performance SoCs Category:Computer Architecture Category:Acceleration_a..."</p>
<hr />
<div><!-- Streaming Integer Extensions for Snitch (M/1-2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Sriedel]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master or Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low- and mixed-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer unit utilization as in the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
A main goal of this thesis is to create a Snitch-based system which is suitable for the type of embedded/edge applications targeted by the RI5CY core. Comparing the fundamentally different approaches to maximizing (integer) compute throughput in terms of performance, flexibility and energy efficiency will yield valuable insights and influence the direction of future embedded MCU development at IIS.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bits).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M) ISA extensions in the Snitch Processor for Signal Processing (M)] (Previous Master thesis project)<br />
<br />
[4] [https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv Snitch IPU accelerator in the MemPool many-core system] (GitHub repository)<br />
<br />
[5] [https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions/#core-v-instruction-set-extensions CORE-V Instruction Set Extensions]<br />
<br />
[6] [https://github.com/riscv/riscv-bitmanip RISC-V Bit Manipulation draft specification] (GitHub repository)<br />
<br />
[7] [https://ieeexplore.ieee.org/abstract/document/9406333 XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes]<br />
<br />
[8] [https://github.com/openhwgroup/cv32e40p OpenHW Group CORE-V CV32E40P RISC-V IP] (GitHub repository)</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7265Streaming Integer Extensions for Snitch (M)2021-11-20T14:44:09Z<p>Smazzola: /* Status: Available */</p>
<hr />
<div><!-- Streaming Integer Extensions for Snitch (M/1-2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Sriedel]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master or Semester Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low- and mixed-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer unit utilization as in the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
A main goal of this thesis is to create a Snitch-based system which is suitable for the type of embedded/edge applications targeted by the RI5CY core. Comparing the fundamentally different approaches to maximizing (integer) compute throughput in terms of performance, flexibility and energy efficiency will yield valuable insights and influence the direction of future embedded MCU development at IIS.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bits).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M) ISA extensions in the Snitch Processor for Signal Processing (M)] (Previous Master thesis project)<br />
<br />
[4] [https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv Snitch IPU accelerator in the MemPool many-core system] (GitHub repository)<br />
<br />
[5] [https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions/#core-v-instruction-set-extensions CORE-V Instruction Set Extensions]<br />
<br />
[6] [https://github.com/riscv/riscv-bitmanip RISC-V Bit Manipulation draft specification] (GitHub repository)<br />
<br />
[7] [https://ieeexplore.ieee.org/abstract/document/9406333 XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes]<br />
<br />
[8] [https://github.com/openhwgroup/cv32e40p OpenHW Group CORE-V CV32E40P RISC-V IP] (GitHub repository)</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7264Streaming Integer Extensions for Snitch (M)2021-11-20T14:43:09Z<p>Smazzola: </p>
<hr />
<div><!-- Streaming Integer Extensions for Snitch (M/1-2S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Sriedel]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
** [[:User:Sriedel | Samuel Riedel]]: [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low- and mixed-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer unit utilization as in the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
A main goal of this thesis is to create a Snitch-based system which is suitable for the type of embedded/edge applications targeted by the RI5CY core. Comparing the fundamentally different approaches to maximizing (integer) compute throughput in terms of performance, flexibility and energy efficiency will yield valuable insights and influence the direction of future embedded MCU development at IIS.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bits).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M) ISA extensions in the Snitch Processor for Signal Processing (M)] (Previous Master thesis project)<br />
<br />
[4] [https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv Snitch IPU accelerator in the MemPool many-core system] (GitHub repository)<br />
<br />
[5] [https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions/#core-v-instruction-set-extensions CORE-V Instruction Set Extensions]<br />
<br />
[6] [https://github.com/riscv/riscv-bitmanip RISC-V Bit Manipulation draft specification] (GitHub repository)<br />
<br />
[7] [https://ieeexplore.ieee.org/abstract/document/9406333 XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes]<br />
<br />
[8] [https://github.com/openhwgroup/cv32e40p OpenHW Group CORE-V CV32E40P RISC-V IP] (GitHub repository)</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Counter-based_Fast_Power_Estimation_using_FPGAs_(M/1-3S)&diff=7214Counter-based Fast Power Estimation using FPGAs (M/1-3S)2021-11-19T15:22:06Z<p>Smazzola: </p>
<hr />
<div><!-- Counter-based Fast Power Estimation using FPGAs (M/1-3S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Tbenz | Thomas Benz]]: [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The power consumed by a digital circuit can be broken down into two main components: ''leakage power'' and ''dynamic power''.<br />
Dynamic power is proportional to the switching activity of the circuit’s gates. In principle, one could measure dynamic power by observing the switching activity of ''each single net'' in the design. This approach is of course ''completely infeasible'' as it would need hundreds of gates to track the activity of a single gate, increasing the area of the design by ''multiple orders of magnitude'', along with the number of nets to track.<br />
<br />
It has been proven that the activity of a circuit can be closely approximated by randomly selecting a handful of signals to be observed [1].<br />
On the other hand, modern computing systems feature a number of ''performance counters'', i.e. hardware registers tracking carefully selected countable events in the circuit (e.g. cache misses, instruction fetches, floating-point operations, …) with cycle-level accuracy.<br />
Performance counters very much reflect the activity of the individual functional units and therefore the whole system. They are usually employed to profile applications performance and resources utilization at runtime; however, studies show they can be very helpful also when it comes to dynamic power modeling, to support both the circuit design phase [2][3] and runtime energy-aware policies [4][5].<br />
<br />
Custom designs implemented on FPGAs do not usually come with performance counters, hence it is care of the hardware designer to insert ''observation points'' for activity estimation. In this context, an interesting question is whether an approach to hardware counters insertion exist such that activity modeling can be more accurate than random insertion, but less effort than manual performance counters.<br />
The implications are very valuable and pave the way for the development of an automatic power modeling framework for any arbitrary netlist, potentially expanding beyond FPGAs.<br />
<br />
== Project ==<br />
<br />
In this project, you will:<br />
<br />
* devise at least one method to extract the activity of each net in an existing RTL design<br />
* simulate the power consumption of the implemented design using Xilinx Vivado and/or a state-of-the-art power simulation tool<br />
* use statistical methods to correlate the toggling activity of a net to the power consumption of the design finding the ''observation points'' of interest<br />
* create a simple performance counter unit to monitor the activity of your ideal set of ''observation points''<br />
* evaluate your approach.<br />
<br />
Depending on the remaining time and your personal interests, further challenges can be tackled:<br />
<br />
* activity of design units usually highly correlate with the activity of unit's databus, which in turn depends on handshaking signals. Expand your approach to detect handshake signals (e.g. ready, valid) and investigate if knowledge about these signals can improve your approach in terms of accuracy and/or runtime<br />
* implement the design with your performance counter unit on an FPGA and use the data collected to calculate and display the power in real time<br />
* instead of correlating the activity of the nets to the power of the entire design, correlate it to key design units (like memory, FPU, CPU, ...) and create real-time per-unit power estimates.<br />
<br />
== Character ==<br />
<br />
* 10% architecture review<br />
* 20% net activity extraction<br />
* 30% power simulation<br />
* 10% implementation<br />
* 30% evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Interest in power-related topics<br />
* Preferred: Experience with Xilinx Vivado or any other FPGA toolchain<br />
<br />
= References =<br />
<br />
<div> [2] [http://sedici.unlp.edu.ar/handle/10915/90904 A Study of Hardware Performance Counters Selection for Cross Architectural GPU Power Modeling]</div><br />
<div> [3] [https://dl.acm.org/doi/abs/10.1145/3466752.3480063 AccelWattch: A Power Modeling Framework for Modern GPUs]</div><br />
<div> [4] [https://dl.acm.org/doi/abs/10.1145/566726.566736 The benefits of event: driven energy accounting in power-sensitive systems]</div><br />
<div> [5] [https://ieeexplore.ieee.org/abstract/document/845896 A survey of design techniques for system-level dynamic power management]</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Counter-based_Fast_Power_Estimation_using_FPGAs_(M/1-3S)&diff=7211Counter-based Fast Power Estimation using FPGAs (M/1-3S)2021-11-19T15:20:03Z<p>Smazzola: </p>
<hr />
<div><!-- Counter-based Fast Power Estimation using FPGAs (M/1-3S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:2021]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Smazzola]]<br />
[[Category:Tbenz]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Semester or Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Tbenz | Thomas Benz]]: [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The power consumed by a digital circuit can be broken down into two main components: ''leakage power'' and ''dynamic power''.<br />
Dynamic power is proportional to the switching activity of the circuit’s gates. In principle, one could measure dynamic power by observing the switching activity of ''each single net'' in the design. This approach is of course ''completely infeasible'' as it would need hundreds of gates to track the activity of a single gate, increasing the area of the design by ''multiple orders of magnitude'', along with the number of nets to track.<br />
<br />
It has been proven that the activity of a circuit can be closely approximated by randomly selecting a handful of signals to be observed [1].<br />
On the other hand, modern computing systems feature a number of ''performance counters'', i.e. hardware registers tracking carefully selected countable events in the circuit (e.g. cache misses, instruction fetches, floating-point operations, …) with cycle-level accuracy.<br />
Performance counters very much reflect the activity of the individual functional units and therefore the whole system. They are usually employed to profile applications performance and resources utilization at runtime; however, studies show they can be very helpful also when it comes to dynamic power modeling, to support both the circuit design phase [2][3] and runtime energy-aware policies [4][5].<br />
<br />
Custom designs implemented on FPGAs do not usually come with performance counters, hence it is care of the hardware designer to insert ''observation points'' for activity estimation. In this context, an interesting question is whether an approach to hardware counters insertion exist such that activity modeling can be more accurate than random insertion, but less effort than manual performance counters.<br />
The implications are very valuable and pave the way for the development of an automatic power modeling framework for any arbitrary netlist, potentially expanding beyond FPGAs.<br />
<br />
== Project ==<br />
<br />
In this project, you will:<br />
<br />
* devise at least one method to extract the activity of each net in an existing RTL design<br />
* simulate the power consumption of the implemented design using Xilinx Vivado and/or a state-of-the-art power simulation tool<br />
* use statistical methods to correlate the toggling activity of a net to the power consumption of the design finding the ''observation points'' of interest<br />
* create a simple performance counter unit to monitor the activity of your ideal set of ''observation points''<br />
* evaluate your approach.<br />
<br />
Depending on the remaining time and your personal interests, further challenges can be tackled:<br />
<br />
* activity of design units usually highly correlate with the activity of unit's databus, which in turn depends on handshaking signals. Expand your approach to detect handshake signals (e.g. ready, valid) and investigate if knowledge about these signals can improve your approach in terms of accuracy and/or runtime<br />
* implement the design with your performance counter unit on an FPGA and use the data collected to calculate and display the power in real time<br />
* instead of correlating the activity of the nets to the power of the entire design, correlate it to key design units (like memory, FPU, CPU, ...) and create real-time per-unit power estimates.<br />
<br />
== Character ==<br />
<br />
* 10% architecture review<br />
* 20% net activity extraction<br />
* 30% power simulation<br />
* 10% implementation<br />
* 30% evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Interest in power-related topics<br />
* Preferred: Experience with Xilinx Vivado or any other FPGA toolchain<br />
<br />
= References =<br />
<br />
<div> [2] http://sedici.unlp.edu.ar/handle/10915/90904</div><br />
<div> [3] https://dl.acm.org/doi/abs/10.1145/3466752.3480063</div><br />
<div> [4] https://dl.acm.org/doi/abs/10.1145/566726.566736</div><br />
<div> [5] https://ieeexplore.ieee.org/abstract/document/845896</div></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=High_Performance_SoCs&diff=7210High Performance SoCs2021-11-19T15:12:44Z<p>Smazzola: /* Available Projects */</p>
<hr />
<div>==High-Performance Systems-on-Chip==<br />
<br />
[[File:Snitch-bd.png|thumb|350px|The ''Snitch'' cluster couples tiny RISC-V ''Snitch'' cores with performant double-precision FPUs to minimize the control-to-compute ratio; it uses hardware loop buffers and stream semantic registers to achieve almost full FPU utilization.]]<br />
[[File:Floorplan_baikonur.png|thumb|350px|''Baikonur'', a 22 nm chip integrating two application-grade RISC-V Ariane cores and 3 Snitch clusters with 8 cores each.]]<br />
[[File:Manticore_concept.png|thumb|350px|Concept art for ''Manticore'', a Snitch-based 22 nm system with 4096 cores on multiple chiplets and with HBM2 memory.]]<br />
<br />
Today, a multitude of data-driven applications such as machine learning, scientific computing, and big data demand an ever-increasing amount of '''parallel floating-point performance''' from computing systems. Increasingly, such applications must scale across a wide range of applications and energy budgets, from supercomputers simulating next week's weather to your smartphone cameras correcting for low light conditions.<br />
<br />
This brings challenges on multiple fronts:<br />
<br />
* '''Energy Efficiency''' becomes a major concern: As logic density increases, supplying these systems with energy and managing their heat dissipation requires increasingly complex solutions.<br />
<br />
* '''Memory bandwidth and latency''' become a major bottleneck as the amount of processed data increases. Despite continuous advances, memory lags behind computing in scaling, and many data-driven problems today are memory-bound.<br />
<br />
* '''Parallelization and scaling''' bring challenges of their own: on-chip interconnects may introduce significant area and performance overheads as they grow, and both the data and instruction streams of cores may compete for valuable memory bandwidth and interfere in a destructive way.<br />
<br />
While all state-of-the-art high-performance computing systems are constrained by the above issues, they are also subject to a fundamental trade-off between efficiency and flexibility. This forms a design space which includes the following paradigms:<br />
<br />
* '''Accelerators''' are designed to do one thing very well: they are very energy efficient and performant and usually offer predetermined data movement. However, they are not or barely programmable, inflexible, and monolithic in their design.<br />
<br />
* '''Superscalar Out-of-Order CPUs''', on the other end, provide extreme flexibility, full programmability, and reasonable performance across various workloads. However, they require large area and energy overheads for a given performance, use memory inefficiently, and are often hard to scale well to manycore systems.<br />
<br />
* '''GPUs''' are parallel and data-oriented by design, yet still meaningfully programmable, aiming for a sweet-spot between scalability, efficiency, and programmability. However, are still subject to memory access challenges and often require manual memory management for decent performance.<br />
<br />
'''How can we further improve on these existing paradigms?''' Can we design decently efficient and performant, yet freely programmable systems with scalable, performant memory systems?<br />
<br />
If these questions sound intriguing to you, consider joining us for a project or thesis! You can find currently available projects and our contact information below.<br />
<br />
==Our Activities==<br />
<br />
We are primarily interested in '''architecture design and hardware implementation''' for high-performance systems. However, ensuring high performance requires us to consider the '''entire hardware-software stack''':<br />
<br />
* '''HPC Software''': Design and porting of high-performance applications, benchmarks, compiler tools, and operating systems (Linux) to our hardware.<br />
* '''Hardware-software codesign''': Design of performance-aware algorithms and kernels and hardware that can be efficiently programmed for use in processor-based systems.<br />
* '''Architecture''': RTL implementation of energy-efficient designs with an emphasis on high utilization and throughput, as well as on efficient interoperability with existing IPs.<br />
* '''SoC design and Implementation''': Design of full high-performance systems-on-chips; implementation and tapeout on modern silicon technologies such as TSMC's 65 nm and GlobalFoundries' 22 nm nodes.<br />
* '''IC testing and Board-Level design''': Testing of the returning chips with industry-grade automated test equipment (ATE) and design of system-level demonstrator boards.<br />
<br />
Our current interests include systems with '''low control-to-compute ratios''', high-performance '''on-chip interconnects''', and '''scalable many-core systems'''. However, we are always happy to explore new domains; if you have an interesting idea, contact us and we can discuss it in detail!<br />
<br />
==Who are we==<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Smazzola_face_1to1.png|frameless|left|96px]]<br />
|<br />
===[[:User:Smazzola | Sergio Mazzola]]===<br />
* '''e-mail''': [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 81 49<br />
* '''office''': ETZ J76.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Paulsc_face_1to1.png|frameless|left|96px]]<br />
|<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Tbenz_face_pulp_team.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Tbenz | Thomas Benz]]===<br />
* '''e-mail''': [mailto:tbenz@iis.ee.ethz.ch tbenz@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 05 18<br />
* '''office''': ETZ J85<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Nwistoff_face_pulp_team.JPG|frameless|left|96px]]<br />
|<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:lbertaccini_photo.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Lbertaccini | Luca Bertaccini]]===<br />
* '''e-mail''': [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 55 58<br />
* '''office''': ETZ J78<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Mperotti_face_pulp_team.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Mperotti | Matteo Perotti]]===<br />
* '''e-mail''': [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 05 25<br />
* '''office''': ETZ J85<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Sriedel_face_pulp_team.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Sriedel | Samuel Riedel]]===<br />
* '''e-mail''': [mailto:sriedel@iis.ee.ethz.ch sriedel@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 65 69<br />
* '''office''': ETZ J71.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Matheusd_face_1to1.png|frameless|left|96px]]<br />
|<br />
===[[:User:Matheusd | Matheus Cavalcante]]===<br />
* '''e-mail''': [mailto:matheusd@iis.ee.ethz.ch matheusd@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 96<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
<!--Retired members<br />
{|<br />
| style="padding: 10px" | [[File:Akurth_face_pulp_team.jpeg|frameless|left|96px]]<br />
|<br />
===[[:User:Akurth | Andreas Kurth]]===<br />
* '''e-mail''': [mailto:akurth@iis.ee.ethz.ch akurth@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 04 87<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
{|<br />
| style="padding: 10px" | [[File:Zarubaf_face_pulp_team.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Zarubaf | Florian Zaruba]]===<br />
* '''e-mail''': [mailto:zarubaf@iis.ee.ethz.ch zarubaf@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 65 56<br />
* '''office''': ETZ J89<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Fschuiki_face_pulp_team.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Fschuiki | Fabian Schuiki]]===<br />
* '''e-mail''': [mailto:fschuiki@iis.ee.ethz.ch fschuiki@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 67 89<br />
* '''office''': ETZ J89<br />
|}<br />
--><br />
<br />
<!--<br />
Who are we<br />
What do we do<br />
Where to find us<br />
--><br />
<br />
==Projects==<br />
<br />
All projects are annotated with one or more possible ''project types'' (M/S/B/G) and a ''number of students'' (1 to 3). <br />
<br />
* '''M''': Master's thesis: ''26 weeks'' full-time (6 months) for ''one student only''<br />
* '''S''': Semester project: ''14 weeks'' half-time (1 semester lecture period) or ''7 weeks'' full-time for ''1-3 students''<br />
* '''B''': Bachelor's thesis: ''14 weeks'' half-time (1 semester lecture period) for ''one student only''<br />
* '''G''': Group project: ''14 weeks'' part-time (1 semester lecture period) for ''2-3 students''<br />
<br />
Usually, these are merely suggestions from our side; proposals can often be reformulated to fit students' needs.<br />
<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
ordermethod=sortkey<br />
order=ascending<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=false<br />
ordermethod=sortkey<br />
order=ascending<br />
</DynamicPageList><br />
===Completed Projects===<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
</DynamicPageList><br />
===Reserved Projects===<br />
<DynamicPageList><br />
category = Reserved<br />
category = Digital<br />
category = High Performance SoCs<br />
suppresserrors=true<br />
ordermethod=sortkey<br />
order=ascending<br />
</DynamicPageList></div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7152Streaming Integer Extensions for Snitch (M)2021-11-18T16:10:14Z<p>Smazzola: </p>
<hr />
<div><!-- Universal Stream Semantic Registers for Snitch (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer unit utilization as in the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bits).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] [https://ieeexplore.ieee.org/document/9216552 Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads]<br />
<br />
[2] [https://ieeexplore.ieee.org/document/9068465 Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores]<br />
<br />
[3] [https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M) ISA extensions in the Snitch Processor for Signal Processing (M)] (Previous Master thesis project)<br />
<br />
[4] [https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv Snitch IPU accelerator in the MemPool many-core system] (GitHub repository)<br />
<br />
[5] [https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions/#core-v-instruction-set-extensions CORE-V Instruction Set Extensions]<br />
<br />
[6] [https://github.com/riscv/riscv-bitmanip RISC-V Bit Manipulation draft specification] (GitHub repository)<br />
<br />
[7] [https://ieeexplore.ieee.org/abstract/document/9406333 XpulpNN: Enabling Energy Efficient and Flexible Inference of Quantized Neural Networks on RISC-V Based IoT End Nodes]<br />
<br />
[8] [https://github.com/openhwgroup/cv32e40p OpenHW Group CORE-V CV32E40P RISC-V IP] (GitHub repository)</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7150Streaming Integer Extensions for Snitch (M)2021-11-17T17:35:38Z<p>Smazzola: /* Project */</p>
<hr />
<div><!-- Universal Stream Semantic Registers for Snitch (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer units utilization as the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bits).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/document/9216552<br />
<br />
[2] https://ieeexplore.ieee.org/document/9068465<br />
<br />
[3] https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M)<br />
<br />
[4] https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv<br />
<br />
[5] https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions<br />
<br />
[6] https://github.com/riscv/riscv-bitmanip<br />
<br />
[7] https://ieeexplore.ieee.org/abstract/document/9406333<br />
<br />
[8] https://github.com/openhwgroup/cv32e40p</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7149Streaming Integer Extensions for Snitch (M)2021-11-17T17:33:36Z<p>Smazzola: /* Introduction */</p>
<hr />
<div><!-- Universal Stream Semantic Registers for Snitch (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not an issue anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer units utilization as the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bit).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/document/9216552<br />
<br />
[2] https://ieeexplore.ieee.org/document/9068465<br />
<br />
[3] https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M)<br />
<br />
[4] https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv<br />
<br />
[5] https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions<br />
<br />
[6] https://github.com/riscv/riscv-bitmanip<br />
<br />
[7] https://ieeexplore.ieee.org/abstract/document/9406333<br />
<br />
[8] https://github.com/openhwgroup/cv32e40p</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7148Streaming Integer Extensions for Snitch (M)2021-11-17T17:32:53Z<p>Smazzola: /* Introduction */</p>
<hr />
<div><!-- Universal Stream Semantic Registers for Snitch (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent gate equivalents] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not a thing anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer units utilization as the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bit).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/document/9216552<br />
<br />
[2] https://ieeexplore.ieee.org/document/9068465<br />
<br />
[3] https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M)<br />
<br />
[4] https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv<br />
<br />
[5] https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions<br />
<br />
[6] https://github.com/riscv/riscv-bitmanip<br />
<br />
[7] https://ieeexplore.ieee.org/abstract/document/9406333<br />
<br />
[8] https://github.com/openhwgroup/cv32e40p</div>Smazzolahttp://iis-projects.ee.ethz.ch/index.php?title=Streaming_Integer_Extensions_for_Snitch_(M)&diff=7147Streaming Integer Extensions for Snitch (M)2021-11-17T17:31:22Z<p>Smazzola: </p>
<hr />
<div><!-- Universal Stream Semantic Registers for Snitch (1S) --><br />
<br />
[[Category:Digital]]<br />
[[Category:High Performance SoCs]]<br />
[[Category:Computer Architecture]]<br />
[[Category:Acceleration_and_Transprecision]]<br />
[[Category:2021]]<br />
[[Category:Master Thesis]]<br />
[[Category:Hot]]<br />
[[Category:Paulsc]]<br />
[[Category:Smazzola]]<br />
[[Category:Georg]]<br />
[[Category:Available]]<br />
<br />
= Overview =<br />
<br />
== Status: Available ==<br />
<br />
* Type: Master Thesis<br />
* Professor: Prof. Dr. L. Benini<br />
* Supervisors:<br />
** [[:User:Paulsc | Paul Scheffler]]: [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
** [[:User:Smazzola | Sergio Mazzola ]]: [mailto:smazzola@iis.ee.ethz.ch smazzola@iis.ee.ethz.ch]<br />
** [[:User:Georg | Georg Rutishauser]]: [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
<br />
= Introduction =<br />
<br />
The Snitch ecosystem [1] targets energy-efficient high-performance systems. It is built around the minimal RISC-V Snitch integer core, only about 15.000 [http://eda.ee.ethz.ch/index.php?title=Gate_equivalent equivalent gates] in size, which can optionally be coupled to accelerators such as an FPU or a DMA engine. <br />
<br />
Snitch’s floating-point subsystem is highly interesting: it includes stream semantic registers (SSRs) [2] and the floating-point repetition (FREP) hardware loop. Thanks to the clever symbiosis of these lightweight extensions, the trade-off between control area overhead and FPU utilization is not a thing anymore for Snitch, as it is able to achieve almost 100% FPU utilization in many data-oblivious problems with regular access patterns.<br />
<br />
Recently, we explored two new accelerator-based extensions for Snitch [3], both of which aim to boost performance and energy efficiency of '''integer-based workloads''' such as signal processing and low-precision machine learning. However, neither approach currently supports all the features we would like to use, such as SSRs, and both are based on outdated versions of Snitch.<br />
<br />
Ideally, we would like to have one unified, mature approach to integer workload acceleration in our mainline version of Snitch, targeting full integer units utilization as the floating-point subsystem. The simplest way to achieve this is by integrating features from the existing extensions, add further features to fit our needs, and evaluate their performance benefits of the resulting system.<br />
<br />
= Project =<br />
<br />
* '''Integrate the current partial Xpulpv2 implementation''' [3][4] in the mainline Snitch version. This will require you to <br />
** Adapt to the changes in the mainline Snitch codebase and parameterize the existing code <br />
** Possibly switch to a standardized accelerator interface such as X-interface<br />
** Verify the functionality of your extensions.<br />
* '''Implement parametric support for integer SSRs''' which <br />
** Are shared between floating-point and integer datapaths when both are available<br />
** Support configurable datawidths (8, 16, 32, 64 bit).<br />
* '''Implement additional instructions of interest''', which could include<br />
** A complete implementation of Xpulp [5] or a closed subset of its partitions<br />
** The proposed draft Bitmanip extension [6]<br />
** A simple integer hardware loop [5].<br />
* '''Evaluate your extensions''' by <br />
** Determining the performance impact on representative integer workloads<br />
** Determining the area and timing impact in synthesis<br />
** Comparing them to the existing RI5CY core with XpulpNN and MAC&Load extensions [7][8].<br />
<br />
== Character ==<br />
<br />
* 20% Literature / architecture review<br />
* 40% RTL implementation<br />
* 20% Bare-metal C programming<br />
* 20% Evaluation<br />
<br />
== Prerequisites ==<br />
<br />
* Strong interest in computer architecture and memory systems<br />
* Experience with digital design in SystemVerilog as taught in VLSI I<br />
* Experience with ASIC implementation flow (synthesis) as taught in VLSI II<br />
* SoCs for Data Analytics and ML and/or Computer Architecture lectures or equivalent<br />
* Preferred: Knowledge or prior experience with RISC-V or ISA extension design<br />
<br />
= References =<br />
<br />
[1] https://ieeexplore.ieee.org/document/9216552<br />
<br />
[2] https://ieeexplore.ieee.org/document/9068465<br />
<br />
[3] https://iis-projects.ee.ethz.ch/index.php?title=ISA_extensions_in_the_Snitch_Processor_for_Signal_Processing_(M)<br />
<br />
[4] https://github.com/pulp-platform/mempool/blob/main/hardware/deps/snitch/src/snitch_ipu.sv<br />
<br />
[5] https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions<br />
<br />
[6] https://github.com/riscv/riscv-bitmanip<br />
<br />
[7] https://ieeexplore.ieee.org/abstract/document/9406333<br />
<br />
[8] https://github.com/openhwgroup/cv32e40p</div>Smazzola