http://iis-projects.ee.ethz.ch/api.php?action=feedcontributions&user=Lukasc&feedformat=atomiis-projects - User contributions [en]2024-03-28T13:38:59ZUser contributionsMediaWiki 1.28.0http://iis-projects.ee.ethz.ch/index.php?title=Huawei_Research&diff=6979Huawei Research2021-09-23T07:59:35Z<p>Lukasc: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]] <br />
[[Category:Available]] <br />
[[Category:2020]]<br />
[[Category:Hot]]<br />
[[File:Huawei.jpg]]<br />
<br />
===About the Huawei Future Computing Laboratory===<br />
<br />
With 18 sites across Europe and 1500 researchers, Huawei’s European Research Institute (ERI) oversees fundamental and applied technology research, academic research cooperation projects, and strategic technical planning across our network of European R&D facilities. Huawei’s ERI includes the new Zurich Research Center (ZRC), located in Zurich, Switzerland. A major element of ZRC is a new research laboratory focused on fundamental research in the area of future computing systems (new hardware, new software, new algorithms).<br />
<br />
The research work of the lab will be carried out not only by Huawei’s internal research staff but also by our academic research partners in universities across Europe. The lab will provide an “open research environment” where academics will be encouraged to visit and work on fundamental long-term research alongside Huawei staff in an environment that, like the best universities and research institutes, is open and conducive to such scientific work.<br />
<br />
<br />
<!--===Useful Reading===<br />
Coming soon<br />
<br />
===Prerequisites===<br />
*General interest in Deep Learning and memory/system design<br />
*VLSI I and VLSI II (''recommended'')<br />
--><br />
<br />
==Available and On-Going Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
{| class="wikitable" style="text-align: center;"<br />
|-<br />
! style="width: 4%;"|Status !! style="width: 2%;"|Year !! style="width: 5%;"|Type !! style="width: 20%"|Project !! style="width: 40%"|Description !! style="width: 5%"|Topic !! style="width: 15%"|Workload Type || Contact <br />
|-<br />
<br />
| on-going || 2021 || Semester Thesis || Digital VLSI Design (ML Acceleration)|| Winograd has been exploited for efficient calculation of convolutions which are typically used in ML applications (e.g., image classification), a novel algorithm shows nice properties to use complex Winograd to further reduce the computational complexity. In this project, we would like to evaluate the actual benefits in HW by designing an accelerator exploiting the new algorithm. || AI Acceleration || digital VLSI design || Renzo Andri, TBD PhD student at IIS<br />
|-<br />
| on-going || 2021 || Internship || Digital VLSI Design Intern (ML Acceleration)|| [https://apply.workable.com/huawei-16/j/CE22FFA23B/ Link to description] || AI Acceleration || digital VLSI design || Renzo Andri<br />
|-<br />
| on-going || 2021 || Internship || High-Performance Machine Learning Kernel Development || [https://apply.workable.com/huawei-16/j/E29D785D1A/ Link to description] || AI Acceleration || hardware-level SW development || Renzo Andri, Lukas Cavigelli<br />
|-<br />
|}<br />
<br />
==Contact==<br />
* Renzo Andri, firstname.lastname at huawei com<br />
* Lukas Cavigelli, firstname.lastname at huawei com<br />
<br />
==Detailed Information==<br />
<br />
===Internship Digital VLSI Design for ML Acceleration (Taken)===<br />
This internship has been taken, if you are interested in similar topics, get in contact with us.<br />
<br />
For the new ZRC Laboratory, we were looking for an outstanding Digital VLSI Design Intern. As a key member in our motivated and multicultural team, you will support to design and evaluate novel VLSI architectures for energy-efficient machine learning acceleration.<br />
<br />
'''Your Responsibilities'''<br />
* Design and Implementation of Digital VLSI HW architecture (RTL) for Machine Learning Acceleration<br />
* Mapping of data, parameters and computations from a ML framework to the HW Accelerator.<br />
* Synthesis and Backend/Layout and gate-level power simulation<br />
* Scientific evaluation and potential publication.<br />
<br />
'''Requirements - Your background'''<br />
* You are currently enrolled in a Master’s degree or PhD in electrical engineering, compute engineering or computer science, or any related fields at a reputable university; or you graduated within the last six months<br />
* Solid Digital VLSI Design knowledge Front-end and preferably also Back-end (e.g., VLSI I-II)<br />
* You have worked on a VLSI project (e.g., semester/master thesis at IIS) and used industry-standard tools like Design Compiler, Innovus, Modelsim or similar.<br />
* Basic knowledge in computer arithmetics.<br />
* Basic knowledge in machine learning is an asset.<br />
* Strong coding and scripting skills (SystemVerilog/VHDL, Python, TCL, Bash etc.)<br />
* Excellent communication and writing skills in English<br />
<br />
Interested to develop with us the next generation of machine learning hardware, then apply [https://apply.workable.com/huawei-16/j/CE22FFA23B/ here]</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Huawei_Research&diff=6978Huawei Research2021-09-23T07:58:04Z<p>Lukasc: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]] <br />
[[Category:Available]] <br />
[[Category:2020]]<br />
[[Category:Hot]]<br />
[[File:Huawei.jpg]]<br />
<br />
===About the Huawei Future Computing Laboratory===<br />
<br />
With 18 sites across Europe and 1500 researchers, Huawei’s European Research Institute (ERI) oversees fundamental and applied technology research, academic research cooperation projects, and strategic technical planning across our network of European R&D facilities. Huawei’s ERI includes the new Zurich Research Center (ZRC), located in Zurich, Switzerland. A major element of ZRC is a new research laboratory focused on fundamental research in the area of future computing systems (new hardware, new software, new algorithms).<br />
<br />
The research work of the lab will be carried out not only by Huawei’s internal research staff but also by our academic research partners in universities across Europe. The lab will provide an “open research environment” where academics will be encouraged to visit and work on fundamental long-term research alongside Huawei staff in an environment that, like the best universities and research institutes, is open and conducive to such scientific work.<br />
<br />
<br />
<!--===Useful Reading===<br />
Coming soon<br />
<br />
===Prerequisites===<br />
*General interest in Deep Learning and memory/system design<br />
*VLSI I and VLSI II (''recommended'')<br />
--><br />
<br />
==Available and On-Going Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
{| class="wikitable" style="text-align: center;"<br />
|-<br />
! style="width: 4%;"|Status !! style="width: 2%;"|Year !! style="width: 5%;"|Type !! style="width: 20%"|Project !! style="width: 40%"|Description !! style="width: 5%"|Topic !! style="width: 15%"|Workload Type || Contact <br />
|-<br />
<br />
| on-going || 2021 || Semester Thesis || Digital VLSI Design (ML Acceleration)|| Winograd has been exploited for efficient calculation of convolutions which are typically used in ML applications (e.g., image classification), a novel algorithm shows nice properties to use complex Winograd to further reduce the computational complexity. In this project, we would like to evaluate the actual benefits in HW by designing an accelerator exploiting the new algorithm. || AI Acceleration || digital VLSI design || Renzo Andri, TBD PhD student at IIS<br />
|-<br />
| on-going || 2021 || Internship || Digital VLSI Design Intern (ML Acceleration)|| [https://apply.workable.com/huawei-16/j/CE22FFA23B/ Link to description] || AI Acceleration || digital VLSI design || Renzo Andri<br />
|-<br />
| on-going || 2021 || Internship || High-Performance Machine Learning Kernel Development || [https://apply.workable.com/huawei-16/j/E29D785D1A/ Link to description] || AI Acceleration || hardware-level SW development || Renzo Andri, Lukas Cavigelli<br />
|-<br />
|}<br />
<br />
==Contact==<br />
* Renzo Andri, firstname.lastname at huawei com<br />
* Lukas Cavigelli, firstname.lastname at huawei com<br />
<br />
==Detailed Information==<br />
<br />
===Internship Digital VLSI Design for ML Acceleration (Taken)===<br />
This internship has been taken, if you are interested in similar topics, get in contact with us.<br />
<br />
For the new ZRC Laboratory, we were looking for an outstanding Digital VLSI Design Intern. As a key member in our motivated and multicultural team, you will support to design and evaluate novel VLSI architectures for energy-efficient machine learning acceleration.<br />
<br />
'''Your Responsibilities'''<br />
* Design and Implementation of Digital VLSI HW architecture (RTL) for Machine Learning Acceleration<br />
* Mapping of data, parameters and computations from a ML framework to the HW Accelerator.<br />
* Synthesis and Backend/Layout and gate-level power simulation<br />
* Scientific evaluation and potential publication.<br />
<br />
<br />
'''Requirements - Your background'''<br />
* You are currently enrolled in a Master’s degree or PhD in electrical engineering, compute engineering or computer science, or any related fields at a reputable university; or you graduated within the last six months<br />
* Solid Digital VLSI Design knowledge Front-end and preferably also Back-end (e.g., VLSI I-II)<br />
* You have worked on a VLSI project (e.g., semester/master thesis at IIS) and used industry-standard tools like Design Compiler, Innovus, Modelsim or similar.<br />
* Basic knowledge in computer arithmetics.<br />
* Basic knowledge in machine learning is an asset.<br />
* Strong coding and scripting skills (SystemVerilog/VHDL, Python, TCL, Bash etc.)<br />
* Excellent communication and writing skills in English<br />
<br />
Interested to develop with us the next generation of machine learning hardware, then apply [https://apply.workable.com/huawei-16/j/CE22FFA23B/ here]</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Huawei_Research&diff=6977Huawei Research2021-09-23T07:43:59Z<p>Lukasc: /* Available and On-Going Projects */</p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]] <br />
[[Category:Available]] <br />
[[Category:2020]]<br />
[[Category:Hot]]<br />
[[File:Huawei.jpg]]<br />
<br />
===About the Huawei Future Computing Laboratory===<br />
<br />
With 18 sites across Europe and 1500 researchers, Huawei’s European Research Institute (ERI) oversees fundamental and applied technology research, academic research cooperation projects, and strategic technical planning across our network of European R&D facilities. Huawei’s ERI includes the new Zurich Research Center (ZRC), located in Zurich, Switzerland. A major element of ZRC is a new research laboratory focused on fundamental research in the area of future computing systems (new hardware, new software, new algorithms).<br />
<br />
The research work of the lab will be carried out not only by Huawei’s internal research staff but also by our academic research partners in universities across Europe. The lab will provide an “open research environment” where academics will be encouraged to visit and work on fundamental long-term research alongside Huawei staff in an environment that, like the best universities and research institutes, is open and conducive to such scientific work.<br />
<br />
==Internship Digital VLSI Design for ML Acceleration (Taken)==<br />
This internship has been taken, if you are interested in similar topics, get in contact with us.<br />
<br />
For the new ZRC Laboratory, we were looking for an outstanding Digital VLSI Design Intern. As a key member in our motivated and multicultural team, you will support to design and evaluate novel VLSI architectures for energy-efficient machine learning acceleration.<br />
<br />
===Your Responsibilities===<br />
<br />
* Design and Implementation of Digital VLSI HW architecture (RTL) for Machine Learning Acceleration<br />
* Mapping of data, parameters and computations from a ML framework to the HW Accelerator.<br />
* Synthesis and Backend/Layout and gate-level power simulation<br />
* Scientific evaluation and potential publication.<br />
<br />
<br />
===Requirements - Your background===<br />
<br />
* You are currently enrolled in a Master’s degree or PhD in electrical engineering, compute engineering or computer science, or any related fields at a reputable university; or you graduated within the last six months<br />
* Solid Digital VLSI Design knowledge Front-end and preferably also Back-end (e.g., VLSI I-II)<br />
* You have worked on a VLSI project (e.g., semester/master thesis at IIS) and used industry-standard tools like Design Compiler, Innovus, Modelsim or similar.<br />
* Basic knowledge in computer arithmetics.<br />
* Basic knowledge in machine learning is an asset.<br />
* Strong coding and scripting skills (SystemVerilog/VHDL, Python, TCL, Bash etc.)<br />
* Excellent communication and writing skills in English<br />
<br />
Interested to develop with us the next generation of machine learning hardware, then apply [https://apply.workable.com/huawei-16/j/CE22FFA23B/ here]<br />
<br />
<br />
<!--===Useful Reading===<br />
Coming soon<br />
<br />
===Prerequisites===<br />
*General interest in Deep Learning and memory/system design<br />
*VLSI I and VLSI II (''recommended'')<br />
--><br />
<br />
==Available and On-Going Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
{| class="wikitable" style="text-align: center;"<br />
|-<br />
! style="width: 4%;"|Status !! style="width: 2%;"|Year !! style="width: 5%;"|Type !! style="width: 20%"|Project !! style="width: 40%"|Description !! style="width: 5%"|Topic !! style="width: 15%"|Workload Type || Contact <br />
|-<br />
<br />
| on-going || 2021 || Semester Thesis || Digital VLSI Design (ML Acceleration)|| Winograd has been exploited for efficient calculation of convolutions which are typically used in ML applications (e.g., image classification), a novel algorithm shows nice properties to use complex Winograd to further reduce the computational complexity. In this project, we would like to evaluate the actual benefits in HW by designing an accelerator exploiting the new algorithm. || AI Acceleration || digital VLSI design || Renzo Andri, TBD PhD student at IIS<br />
|-<br />
| on-going || 2021 || Internship || Digital VLSI Design Intern (ML Acceleration)|| [https://apply.workable.com/huawei-16/j/CE22FFA23B/ Link to description] || AI Acceleration || digital VLSI design || Renzo Andri<br />
|-<br />
| on-going || 2021 || Internship || High-Performance Machine Learning Kernel Development || [https://apply.workable.com/huawei-16/j/E29D785D1A/ Link to description] || AI Acceleration || hardware-level SW development || Renzo Andri, Lukas Cavigelli<br />
|-<br />
|}<br />
<br />
==Contact==<br />
: Renzo Andri, firstname.lastname at huawei com<br />
: Lukas Cavigelli, firstname.lastname at huawei com<br />
<br />
==Detailed Information==</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Huawei_Research&diff=6976Huawei Research2021-09-23T07:43:01Z<p>Lukasc: </p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]] <br />
[[Category:Available]] <br />
[[Category:2020]]<br />
[[Category:Hot]]<br />
[[File:Huawei.jpg]]<br />
<br />
===About the Huawei Future Computing Laboratory===<br />
<br />
With 18 sites across Europe and 1500 researchers, Huawei’s European Research Institute (ERI) oversees fundamental and applied technology research, academic research cooperation projects, and strategic technical planning across our network of European R&D facilities. Huawei’s ERI includes the new Zurich Research Center (ZRC), located in Zurich, Switzerland. A major element of ZRC is a new research laboratory focused on fundamental research in the area of future computing systems (new hardware, new software, new algorithms).<br />
<br />
The research work of the lab will be carried out not only by Huawei’s internal research staff but also by our academic research partners in universities across Europe. The lab will provide an “open research environment” where academics will be encouraged to visit and work on fundamental long-term research alongside Huawei staff in an environment that, like the best universities and research institutes, is open and conducive to such scientific work.<br />
<br />
==Internship Digital VLSI Design for ML Acceleration (Taken)==<br />
This internship has been taken, if you are interested in similar topics, get in contact with us.<br />
<br />
For the new ZRC Laboratory, we were looking for an outstanding Digital VLSI Design Intern. As a key member in our motivated and multicultural team, you will support to design and evaluate novel VLSI architectures for energy-efficient machine learning acceleration.<br />
<br />
===Your Responsibilities===<br />
<br />
* Design and Implementation of Digital VLSI HW architecture (RTL) for Machine Learning Acceleration<br />
* Mapping of data, parameters and computations from a ML framework to the HW Accelerator.<br />
* Synthesis and Backend/Layout and gate-level power simulation<br />
* Scientific evaluation and potential publication.<br />
<br />
<br />
===Requirements - Your background===<br />
<br />
* You are currently enrolled in a Master’s degree or PhD in electrical engineering, compute engineering or computer science, or any related fields at a reputable university; or you graduated within the last six months<br />
* Solid Digital VLSI Design knowledge Front-end and preferably also Back-end (e.g., VLSI I-II)<br />
* You have worked on a VLSI project (e.g., semester/master thesis at IIS) and used industry-standard tools like Design Compiler, Innovus, Modelsim or similar.<br />
* Basic knowledge in computer arithmetics.<br />
* Basic knowledge in machine learning is an asset.<br />
* Strong coding and scripting skills (SystemVerilog/VHDL, Python, TCL, Bash etc.)<br />
* Excellent communication and writing skills in English<br />
<br />
Interested to develop with us the next generation of machine learning hardware, then apply [https://apply.workable.com/huawei-16/j/CE22FFA23B/ here]<br />
<br />
<br />
<!--===Useful Reading===<br />
Coming soon<br />
<br />
===Prerequisites===<br />
*General interest in Deep Learning and memory/system design<br />
*VLSI I and VLSI II (''recommended'')<br />
--><br />
<br />
==Available and On-Going Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
{| class="wikitable" style="text-align: center;"<br />
|-<br />
! style="width: 2%;"|Year!! style="width: 5%;"|Type !! style="width: 20%"|Project !! style="width: 40%"|Description !! style="width: 5%"|Topic !! style="width: 15%"|Workload Type || Contact <br />
|-<br />
<br />
| 2021 || Semester Thesis || Digital VLSI Design (ML Acceleration)|| Winograd has been exploited for efficient calculation of convolutions which are typically used in ML applications (e.g., image classification), a novel algorithm shows nice properties to use complex Winograd to further reduce the computational complexity. In this project, we would like to evaluate the actual benefits in HW by designing an accelerator exploiting the new algorithm. || AI Acceleration || digital VLSI design || Renzo Andri, TBD PhD student at IIS<br />
|-<br />
| 2021 || Internship || Digital VLSI Design Intern (ML Acceleration)|| [https://apply.workable.com/huawei-16/j/CE22FFA23B/ Link to description] || AI Acceleration || digital VLSI design || Renzo Andri<br />
|-<br />
| 2021 || Internship || High-Performance Machine Learning Kernel Development || [https://apply.workable.com/huawei-16/j/E29D785D1A/ Link to description] || AI Acceleration || hardware-level SW development || Renzo Andri, Lukas Cavigelli<br />
|-<br />
|}<br />
<br />
==Contact==<br />
: Renzo Andri, firstname.lastname at huawei com<br />
: Lukas Cavigelli, firstname.lastname at huawei com<br />
<br />
==Detailed Information==</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Huawei_Research&diff=6975Huawei Research2021-09-23T07:41:43Z<p>Lukasc: /* On-going Projects */</p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]] <br />
[[Category:Available]] <br />
[[Category:2020]]<br />
[[Category:Hot]]<br />
[[File:Huawei.jpg]]<br />
<br />
===About the Huawei Future Computing Laboratory===<br />
<br />
With 18 sites across Europe and 1500 researchers, Huawei’s European Research Institute (ERI) oversees fundamental and applied technology research, academic research cooperation projects, and strategic technical planning across our network of European R&D facilities. Huawei’s ERI includes the new Zurich Research Center (ZRC), located in Zurich, Switzerland. A major element of ZRC is a new research laboratory focused on fundamental research in the area of future computing systems (new hardware, new software, new algorithms).<br />
<br />
The research work of the lab will be carried out not only by Huawei’s internal research staff but also by our academic research partners in universities across Europe. The lab will provide an “open research environment” where academics will be encouraged to visit and work on fundamental long-term research alongside Huawei staff in an environment that, like the best universities and research institutes, is open and conducive to such scientific work.<br />
<br />
==Internship Digital VLSI Design for ML Acceleration (Taken)==<br />
This internship has been taken, if you are interested in similar topics, get in contact with us.<br />
<br />
For the new ZRC Laboratory, we were looking for an outstanding Digital VLSI Design Intern. As a key member in our motivated and multicultural team, you will support to design and evaluate novel VLSI architectures for energy-efficient machine learning acceleration.<br />
<br />
===Your Responsibilities===<br />
<br />
* Design and Implementation of Digital VLSI HW architecture (RTL) for Machine Learning Acceleration<br />
* Mapping of data, parameters and computations from a ML framework to the HW Accelerator.<br />
* Synthesis and Backend/Layout and gate-level power simulation<br />
* Scientific evaluation and potential publication.<br />
<br />
<br />
===Requirements - Your background===<br />
<br />
* You are currently enrolled in a Master’s degree or PhD in electrical engineering, compute engineering or computer science, or any related fields at a reputable university; or you graduated within the last six months<br />
* Solid Digital VLSI Design knowledge Front-end and preferably also Back-end (e.g., VLSI I-II)<br />
* You have worked on a VLSI project (e.g., semester/master thesis at IIS) and used industry-standard tools like Design Compiler, Innovus, Modelsim or similar.<br />
* Basic knowledge in computer arithmetics.<br />
* Basic knowledge in machine learning is an asset.<br />
* Strong coding and scripting skills (SystemVerilog/VHDL, Python, TCL, Bash etc.)<br />
* Excellent communication and writing skills in English<br />
<br />
Interested to develop with us the next generation of machine learning hardware, then apply [https://apply.workable.com/huawei-16/j/CE22FFA23B/ here]<br />
<br />
<br />
<!--===Useful Reading===<br />
Coming soon<br />
<br />
===Prerequisites===<br />
*General interest in Deep Learning and memory/system design<br />
*VLSI I and VLSI II (''recommended'')<br />
--><br />
<br />
==Available Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
<br />
==On-going Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
{| class="wikitable" style="text-align: center;"<br />
|-<br />
! style="width: 2%;"|Year!! style="width: 5%;"|Type !! style="width: 20%"|Project !! style="width: 40%"|Description !! style="width: 5%"|Topic !! style="width: 15%"|Workload Type || Contact <br />
|-<br />
<br />
| 2021 || Semester Thesis || Digital VLSI Design (ML Acceleration)|| Winograd has been exploited for efficient calculation of convolutions which are typically used in ML applications (e.g., image classification), a novel algorithm shows nice properties to use complex Winograd to further reduce the computational complexity. In this project, we would like to evaluate the actual benefits in HW by designing an accelerator exploiting the new algorithm. || AI Acceleration || digital VLSI design || Renzo Andri, TBD PhD student at IIS<br />
|-<br />
| 2021 || Internship || Digital VLSI Design Intern (ML Acceleration)|| [https://apply.workable.com/huawei-16/j/CE22FFA23B/ Link to description] || AI Acceleration || digital VLSI design || Renzo Andri<br />
|-<br />
| 2021 || Internship || High-Performance Machine Learning Kernel Development || [https://apply.workable.com/huawei-16/j/E29D785D1A/ Link to description] || AI Acceleration || hardware-level SW development || Renzo Andri, Lukas Cavigelli<br />
|-<br />
|}<br />
<br />
==Contact==<br />
: Renzo Andri, firstname.lastname at huawei com<br />
: Lukas Cavigelli, firstname.lastname at huawei com</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Huawei_Research&diff=6974Huawei Research2021-09-23T07:39:58Z<p>Lukasc: /* Contact */</p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]] <br />
[[Category:Available]] <br />
[[Category:2020]]<br />
[[Category:Hot]]<br />
[[File:Huawei.jpg]]<br />
<br />
===About the Huawei Future Computing Laboratory===<br />
<br />
With 18 sites across Europe and 1500 researchers, Huawei’s European Research Institute (ERI) oversees fundamental and applied technology research, academic research cooperation projects, and strategic technical planning across our network of European R&D facilities. Huawei’s ERI includes the new Zurich Research Center (ZRC), located in Zurich, Switzerland. A major element of ZRC is a new research laboratory focused on fundamental research in the area of future computing systems (new hardware, new software, new algorithms).<br />
<br />
The research work of the lab will be carried out not only by Huawei’s internal research staff but also by our academic research partners in universities across Europe. The lab will provide an “open research environment” where academics will be encouraged to visit and work on fundamental long-term research alongside Huawei staff in an environment that, like the best universities and research institutes, is open and conducive to such scientific work.<br />
<br />
==Internship Digital VLSI Design for ML Acceleration (Taken)==<br />
This internship has been taken, if you are interested in similar topics, get in contact with us.<br />
<br />
For the new ZRC Laboratory, we were looking for an outstanding Digital VLSI Design Intern. As a key member in our motivated and multicultural team, you will support to design and evaluate novel VLSI architectures for energy-efficient machine learning acceleration.<br />
<br />
===Your Responsibilities===<br />
<br />
* Design and Implementation of Digital VLSI HW architecture (RTL) for Machine Learning Acceleration<br />
* Mapping of data, parameters and computations from a ML framework to the HW Accelerator.<br />
* Synthesis and Backend/Layout and gate-level power simulation<br />
* Scientific evaluation and potential publication.<br />
<br />
<br />
===Requirements - Your background===<br />
<br />
* You are currently enrolled in a Master’s degree or PhD in electrical engineering, compute engineering or computer science, or any related fields at a reputable university; or you graduated within the last six months<br />
* Solid Digital VLSI Design knowledge Front-end and preferably also Back-end (e.g., VLSI I-II)<br />
* You have worked on a VLSI project (e.g., semester/master thesis at IIS) and used industry-standard tools like Design Compiler, Innovus, Modelsim or similar.<br />
* Basic knowledge in computer arithmetics.<br />
* Basic knowledge in machine learning is an asset.<br />
* Strong coding and scripting skills (SystemVerilog/VHDL, Python, TCL, Bash etc.)<br />
* Excellent communication and writing skills in English<br />
<br />
Interested to develop with us the next generation of machine learning hardware, then apply [https://apply.workable.com/huawei-16/j/CE22FFA23B/ here]<br />
<br />
<br />
<!--===Useful Reading===<br />
Coming soon<br />
<br />
===Prerequisites===<br />
*General interest in Deep Learning and memory/system design<br />
*VLSI I and VLSI II (''recommended'')<br />
--><br />
<br />
==Available Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
<br />
==On-going Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
{| class="wikitable" style="text-align: center;"<br />
|-<br />
! style="width: 2%;"|Year!! style="width: 5%;"|Type !! style="width: 20%"|Project !! style="width: 40%"|Description !! style="width: 5%"|Topic !! style="width: 15%"|Workload Type || Contact <br />
|-<br />
<br />
| 2021 || Semester Thesis || Digital VLSI Design (ML Acceleration)|| Winograd has been exploited for efficient calculation of convolutions which are typically used in ML applications (e.g., image classification), a novel algorithm shows nice properties to use complex Winograd to further reduce the computational complexity. In this project, we would like to evaluate the actual benefits in HW by designing an accelerator exploiting the new algorithm. || AI Acceleration || digital VLSI design || Dr. Andri, TBD PhD student at IIS<br />
|-<br />
| 2021 || Internship || Digital VLSI Design Intern (ML Acceleration)|| [https://apply.workable.com/huawei-16/j/CE22FFA23B/ Link to description] || AI Acceleration || digital VLSI design || Dr. Andri<br />
|-<br />
| 2021 || Internship || High-Performance Machine Learning Kernel Development || [https://apply.workable.com/huawei-16/j/E29D785D1A/ Link to description] || AI Acceleration || hardware-level SW development || Dr. Andri, Dr. Cavigelli<br />
|-<br />
|}<br />
<br />
==Contact==<br />
: Renzo Andri, firstname.lastname at huawei com<br />
: Lukas Cavigelli, firstname.lastname at huawei com</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Huawei_Research&diff=6973Huawei Research2021-09-23T07:39:33Z<p>Lukasc: /* Contact */</p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]] <br />
[[Category:Available]] <br />
[[Category:2020]]<br />
[[Category:Hot]]<br />
[[File:Huawei.jpg]]<br />
<br />
===About the Huawei Future Computing Laboratory===<br />
<br />
With 18 sites across Europe and 1500 researchers, Huawei’s European Research Institute (ERI) oversees fundamental and applied technology research, academic research cooperation projects, and strategic technical planning across our network of European R&D facilities. Huawei’s ERI includes the new Zurich Research Center (ZRC), located in Zurich, Switzerland. A major element of ZRC is a new research laboratory focused on fundamental research in the area of future computing systems (new hardware, new software, new algorithms).<br />
<br />
The research work of the lab will be carried out not only by Huawei’s internal research staff but also by our academic research partners in universities across Europe. The lab will provide an “open research environment” where academics will be encouraged to visit and work on fundamental long-term research alongside Huawei staff in an environment that, like the best universities and research institutes, is open and conducive to such scientific work.<br />
<br />
==Internship Digital VLSI Design for ML Acceleration (Taken)==<br />
This internship has been taken, if you are interested in similar topics, get in contact with us.<br />
<br />
For the new ZRC Laboratory, we were looking for an outstanding Digital VLSI Design Intern. As a key member in our motivated and multicultural team, you will support to design and evaluate novel VLSI architectures for energy-efficient machine learning acceleration.<br />
<br />
===Your Responsibilities===<br />
<br />
* Design and Implementation of Digital VLSI HW architecture (RTL) for Machine Learning Acceleration<br />
* Mapping of data, parameters and computations from a ML framework to the HW Accelerator.<br />
* Synthesis and Backend/Layout and gate-level power simulation<br />
* Scientific evaluation and potential publication.<br />
<br />
<br />
===Requirements - Your background===<br />
<br />
* You are currently enrolled in a Master’s degree or PhD in electrical engineering, compute engineering or computer science, or any related fields at a reputable university; or you graduated within the last six months<br />
* Solid Digital VLSI Design knowledge Front-end and preferably also Back-end (e.g., VLSI I-II)<br />
* You have worked on a VLSI project (e.g., semester/master thesis at IIS) and used industry-standard tools like Design Compiler, Innovus, Modelsim or similar.<br />
* Basic knowledge in computer arithmetics.<br />
* Basic knowledge in machine learning is an asset.<br />
* Strong coding and scripting skills (SystemVerilog/VHDL, Python, TCL, Bash etc.)<br />
* Excellent communication and writing skills in English<br />
<br />
Interested to develop with us the next generation of machine learning hardware, then apply [https://apply.workable.com/huawei-16/j/CE22FFA23B/ here]<br />
<br />
<br />
<!--===Useful Reading===<br />
Coming soon<br />
<br />
===Prerequisites===<br />
*General interest in Deep Learning and memory/system design<br />
*VLSI I and VLSI II (''recommended'')<br />
--><br />
<br />
==Available Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
<br />
==On-going Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
{| class="wikitable" style="text-align: center;"<br />
|-<br />
! style="width: 2%;"|Year!! style="width: 5%;"|Type !! style="width: 20%"|Project !! style="width: 40%"|Description !! style="width: 5%"|Topic !! style="width: 15%"|Workload Type || Contact <br />
|-<br />
<br />
| 2021 || Semester Thesis || Digital VLSI Design (ML Acceleration)|| Winograd has been exploited for efficient calculation of convolutions which are typically used in ML applications (e.g., image classification), a novel algorithm shows nice properties to use complex Winograd to further reduce the computational complexity. In this project, we would like to evaluate the actual benefits in HW by designing an accelerator exploiting the new algorithm. || AI Acceleration || digital VLSI design || Dr. Andri, TBD PhD student at IIS<br />
|-<br />
| 2021 || Internship || Digital VLSI Design Intern (ML Acceleration)|| [https://apply.workable.com/huawei-16/j/CE22FFA23B/ Link to description] || AI Acceleration || digital VLSI design || Dr. Andri<br />
|-<br />
| 2021 || Internship || High-Performance Machine Learning Kernel Development || [https://apply.workable.com/huawei-16/j/E29D785D1A/ Link to description] || AI Acceleration || hardware-level SW development || Dr. Andri, Dr. Cavigelli<br />
|-<br />
|}<br />
<br />
==Contact==<br />
: Internship at Huawei Research in Zurich-Oerlikon<br />
: Contact (at Huawei RC Zurich): Dr. Renzo Andri, firstname.lastname at huawei com<br />
: Contact (at Huawei RC Zurich): Dr. Lukas Cavigelli, firstname.lastname at huawei com</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Huawei_Research&diff=6972Huawei Research2021-09-23T07:39:06Z<p>Lukasc: /* On-going Projects */</p>
<hr />
<div>[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]] <br />
[[Category:Available]] <br />
[[Category:2020]]<br />
[[Category:Hot]]<br />
[[File:Huawei.jpg]]<br />
<br />
===About the Huawei Future Computing Laboratory===<br />
<br />
With 18 sites across Europe and 1500 researchers, Huawei’s European Research Institute (ERI) oversees fundamental and applied technology research, academic research cooperation projects, and strategic technical planning across our network of European R&D facilities. Huawei’s ERI includes the new Zurich Research Center (ZRC), located in Zurich, Switzerland. A major element of ZRC is a new research laboratory focused on fundamental research in the area of future computing systems (new hardware, new software, new algorithms).<br />
<br />
The research work of the lab will be carried out not only by Huawei’s internal research staff but also by our academic research partners in universities across Europe. The lab will provide an “open research environment” where academics will be encouraged to visit and work on fundamental long-term research alongside Huawei staff in an environment that, like the best universities and research institutes, is open and conducive to such scientific work.<br />
<br />
==Internship Digital VLSI Design for ML Acceleration (Taken)==<br />
This internship has been taken, if you are interested in similar topics, get in contact with us.<br />
<br />
For the new ZRC Laboratory, we were looking for an outstanding Digital VLSI Design Intern. As a key member in our motivated and multicultural team, you will support to design and evaluate novel VLSI architectures for energy-efficient machine learning acceleration.<br />
<br />
===Your Responsibilities===<br />
<br />
* Design and Implementation of Digital VLSI HW architecture (RTL) for Machine Learning Acceleration<br />
* Mapping of data, parameters and computations from a ML framework to the HW Accelerator.<br />
* Synthesis and Backend/Layout and gate-level power simulation<br />
* Scientific evaluation and potential publication.<br />
<br />
<br />
===Requirements - Your background===<br />
<br />
* You are currently enrolled in a Master’s degree or PhD in electrical engineering, compute engineering or computer science, or any related fields at a reputable university; or you graduated within the last six months<br />
* Solid Digital VLSI Design knowledge Front-end and preferably also Back-end (e.g., VLSI I-II)<br />
* You have worked on a VLSI project (e.g., semester/master thesis at IIS) and used industry-standard tools like Design Compiler, Innovus, Modelsim or similar.<br />
* Basic knowledge in computer arithmetics.<br />
* Basic knowledge in machine learning is an asset.<br />
* Strong coding and scripting skills (SystemVerilog/VHDL, Python, TCL, Bash etc.)<br />
* Excellent communication and writing skills in English<br />
<br />
Interested to develop with us the next generation of machine learning hardware, then apply [https://apply.workable.com/huawei-16/j/CE22FFA23B/ here]<br />
<br />
<br />
<!--===Useful Reading===<br />
Coming soon<br />
<br />
===Prerequisites===<br />
*General interest in Deep Learning and memory/system design<br />
*VLSI I and VLSI II (''recommended'')<br />
--><br />
<br />
==Available Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
<br />
==On-going Projects==<br />
We are inviting applications from students to conduct their master’s thesis work or an internship project at the Huawei Future Computing Lab in Zurich on these exciting new topics. <br />
We are open to discuss also other topics. We are also supervising master's and semester theses in collaboration with the Integrated Systems Laboratory. Feel free to contact us, we are happy to hear from you.<br />
<br />
{| class="wikitable" style="text-align: center;"<br />
|-<br />
! style="width: 2%;"|Year!! style="width: 5%;"|Type !! style="width: 20%"|Project !! style="width: 40%"|Description !! style="width: 5%"|Topic !! style="width: 15%"|Workload Type || Contact <br />
|-<br />
<br />
| 2021 || Semester Thesis || Digital VLSI Design (ML Acceleration)|| Winograd has been exploited for efficient calculation of convolutions which are typically used in ML applications (e.g., image classification), a novel algorithm shows nice properties to use complex Winograd to further reduce the computational complexity. In this project, we would like to evaluate the actual benefits in HW by designing an accelerator exploiting the new algorithm. || AI Acceleration || digital VLSI design || Dr. Andri, TBD PhD student at IIS<br />
|-<br />
| 2021 || Internship || Digital VLSI Design Intern (ML Acceleration)|| [https://apply.workable.com/huawei-16/j/CE22FFA23B/ Link to description] || AI Acceleration || digital VLSI design || Dr. Andri<br />
|-<br />
| 2021 || Internship || High-Performance Machine Learning Kernel Development || [https://apply.workable.com/huawei-16/j/E29D785D1A/ Link to description] || AI Acceleration || hardware-level SW development || Dr. Andri, Dr. Cavigelli<br />
|-<br />
|}<br />
<br />
==Contact==<br />
: Internship at Huawei Research in Zurich-Oerlikon<br />
: Contact (at Huawei RC Zurich): Dr. Renzo Andri, surname.name at huawei com<br />
: Contact (at Huawei RC Zurich): Dr. Lukas Cavigelli, surname.name at huawei com</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=5168Deep Learning Projects2020-06-17T14:54:22Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:Paulin|Gianna Paulin]] [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA || A system-level LSTM Acceleration || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. In this project an accelerator for LSTM is implemented as a coarse-grain coprocessor to the RISC-V processor to address this issue. The work will explore datapath, internal storage needs, control interface, memory bandwidth requirements into the L1 in an environment with one or more RISC-V processors. This means that the complete system (e.g. memory bus) has to be analyzed and if necessary be adapted. || ASIC || HW || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| MA/SA || TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we have thus created an accelerator and integrated it into a PULP(issimo) processor system. In this project, you will further improve this accelerator and/or its software. Depending on the number of students and project type, this could lead to a chip tape-out. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA || Ternary-Weights TCN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. Temporal convolutional networks (TCN) have recently been proposed for sequence modelling tasks and achieve state-of-the-art-performance on translation task. TCNs are making use of 1D-fully-convolutional network and causal convolutions. In this work a HW accelerator should be implemented with the ultimate goal of energy efficiency. Potentially this work will make use of an existing ternary-weight convolution accelerator. || ASIC || HW (ASIC) || Georg Rutishauser, [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available || MA/SA || Ternary-Weights TCN Training || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. Temporal convolutional networks (TCN) have recently been proposed for sequence modelling tasks and achieve state-of-the-art-performance on translation task. In this project, you will explore how to train TCN for the use ternary weights with various state-of-the-art training schemes. || Workstation|| SW (algorithm evals) || Georg Rutishauser, [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available || SA || Parallel EBPC || A large part of the power consumption of neural network accelerators goes towards accessing feature maps stored in large central memories. Extended Bit-Plane Compression (EBPC) is a novel, hardware-friendly compression algorithm for DNN feature maps which makes it possible to reduce the transferred data volume and with it, power consumption. A baseline hardware implementation of EBPC which processes a single 8-bit stream of data has already been developed. The next step, and the goal of this project, is to transform it into a parallel architecture which can process multiple 8-bit words at a time while keeping the original architecture's energy efficiency intact (or improving it!). || ASIC/FPGA || HW || [[:User:georgr|Georg Rutishauser]], [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]], [[:User:lukasc|Lukas Cavigelli]], [[:User:fconti|Francesco Conti]]<br />
|-<br />
| taken (M. Scherer) || MA || TNN HW Accel. || Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch<br /><br />
Georg Rutishauser, ETZ J 68.2, georgr@iis.ee.ethz.ch<br /><br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /></div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4756Deep Learning Projects2019-08-30T08:18:00Z<p>Lukasc: /* Where to find us */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| MA/SA || TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we have thus created an accelerator and integrated it into a PULP(issimo) processor system. In this project, you will further improve this accelerator and/or its software. Depending on the number of students and project type, this could lead to a chip tape-out. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA || TNN HW Accel. || Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Logic Synthesis with Graph CNNs || Logic synthesis and optimization as used for ASIC and FPGA implementation is computationally intensive and finding optimal solutions (in terms of area and/or timing) is only possible for extremely tiny circuits. The logical operations linking the inputs and outputs of a hardware block can be described in a graph form on which equivalence transforms can be applied. Current algorithms use hand-designed heuristics to selected which transforms should be applied to iteratively find a satisfactory solution. Instead, this project aims at using Graph CNNs combined with reinforcement learning to select which local graph transforms should be applied. Prerequisites: Know basics of normal DNNs and PyTorch (or another DL framework). || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]], [[:User:lukasc|Lukas Cavigelli]], [[:User:fconti|Francesco Conti]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch<br /><br />
Georg Rutishauser, ETZ J 68.2, georgr@iis.ee.ethz.ch<br /><br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /></div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4755Deep Learning Projects2019-08-29T16:57:06Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| MA/SA || TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we have thus created an accelerator and integrated it into a PULP(issimo) processor system. In this project, you will further improve this accelerator and/or its software. Depending on the number of students and project type, this could lead to a chip tape-out. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA || TNN HW Accel. || Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Logic Synthesis with Graph CNNs || Logic synthesis and optimization as used for ASIC and FPGA implementation is computationally intensive and finding optimal solutions (in terms of area and/or timing) is only possible for extremely tiny circuits. The logical operations linking the inputs and outputs of a hardware block can be described in a graph form on which equivalence transforms can be applied. Current algorithms use hand-designed heuristics to selected which transforms should be applied to iteratively find a satisfactory solution. Instead, this project aims at using Graph CNNs combined with reinforcement learning to select which local graph transforms should be applied. Prerequisites: Know basics of normal DNNs and PyTorch (or another DL framework). || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]], [[:User:lukasc|Lukas Cavigelli]], [[:User:fconti|Francesco Conti]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /></div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4754Deep Learning Projects2019-08-29T16:56:38Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| MA/SA || TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we have thus created an accelerator and integrated it into a PULP(issimo) processor system. In this project, you will further improve this accelerator and/or its software. Depending on the number of students and project type, this could lead to a chip tape-out. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA || TNN HW Accel. || Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Logic Synthesis with Graph CNNs || Logic synthesis and optimization as used for ASIC and FPGA implementation is computationally intensive and finding optimal solutions (in terms of area and/or timing) is only possible for extremely tiny circuits. The logical operations linking the inputs and outputs of a hardware block can be described in a graph form on which equivalence transforms can be applied. Current algorithms use hand-designed heuristics to selected which transforms should be applied to iteratively find a satisfactory solution. Instead, this project aims at using Graph CNNs combined with reinforcement learning to select which local graph transforms should be applied. Prerequisites: Know basics of normal DNNs and PyTorch (or another DL framework). || SW || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]], [[:User:lukasc|Lukas Cavigelli]], [[:User:fconti|Francesco Conti]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /></div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4753Deep Learning Projects2019-08-29T16:56:18Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| MA/SA || TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we have thus created an accelerator and integrated it into a PULP(issimo) processor system. In this project, you will further improve this accelerator and/or its software. Depending on the number of students and project type, this could lead to a chip tape-out. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA || TNN HW Accel. || Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Logic Synthesis with Graph CNNs || Logic synthesis and optimization as used for ASIC and FPGA implementation is computationally intensive and finding optimal solutions (in terms of area and/or timing) is only possible for extremely tiny circuits. The logical operations linking the inputs and outputs of a hardware block can be described in a graph form on which equivalence transforms can be applied. Current algorithms use hand-designed heuristics to selected which transforms should be applied to iteratively find a satisfactory solution. Instead, this project aims at using Graph CNNs combined with reinforcement learning to select which local graph transforms should be applied. Prerequisites: Know basics of DNNs and PyTorch (or another DL framework). || SW || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]], [[:User:lukasc|Lukas Cavigelli]], [[:User:fconti|Francesco Conti]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /></div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4752Deep Learning Projects2019-08-29T16:43:28Z<p>Lukasc: /* Where to find us */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| MA/SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we have thus created an accelerator and integrated it into a PULP(issimo) processor system. In this project, you will further improve this accelerator and/or its software. Depending on the number of students and project type, this could lead to a chip tape-out. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA|| TNN HW Accel. || Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]], [[:User:lukasc|Lukas Cavigelli]], [[:User:fconti|Francesco Conti]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /></div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4751Deep Learning Projects2019-08-29T16:43:09Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| MA/SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we have thus created an accelerator and integrated it into a PULP(issimo) processor system. In this project, you will further improve this accelerator and/or its software. Depending on the number of students and project type, this could lead to a chip tape-out. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA|| TNN HW Accel. || Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]], [[:User:lukasc|Lukas Cavigelli]], [[:User:fconti|Francesco Conti]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4750Deep Learning Projects2019-08-29T16:40:16Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| MA/SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA|| TNN HW Accel. || Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]], [[:User:lukasc|Lukas Cavigelli]], [[:User:fconti|Francesco Conti]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4749Deep Learning Projects2019-08-29T16:39:47Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]], [[:User:lukasc|Lukas Cavigelli]], [[:User:fconti|Francesco Conti]]<br />
|-<br />
| available|| MA/SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA|| TNN HW Accel. || Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4748Deep Learning Projects2019-08-29T16:38:51Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| MA/SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA|| TNN HW Accel. || Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4747Deep Learning Projects2019-08-29T16:23:11Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || Georg Rutishauser, [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4746Deep Learning Projects2019-08-29T16:22:52Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4745Deep Learning Projects2019-08-29T16:22:39Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| available|| 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Fabian Schuiki<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Georg Rutishauser<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4744Deep Learning Projects2019-08-29T16:16:12Z<p>Lukasc: /* On-Going Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (algorithm evals) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| taken || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Fabian Schuiki<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4743Deep Learning Projects2019-08-29T16:15:30Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (GPU) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| taken || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Fabian Schuiki<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4742Deep Learning Projects2019-08-29T16:15:16Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| taken (J. MacPherson) || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (GPU) || [[:User:paulin|Gianna Paulin]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| taken || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Fabian Schuiki<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4741Deep Learning Projects2019-08-29T16:12:27Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| reserved || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (GPU) || [[:User:paulin|Gianna Paulin]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| taken || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Fabian Schuiki<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS19 || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| completed FS19 || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|-<br />
| completed HS18 || 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4740Deep Learning Projects2019-08-29T16:10:58Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA/SA || Low-Power Systolic LSTM Demonstrator || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. We are currently building up a complete speech recognition demonstrator based on a systolic grid of in-house designed LSTM accelerators called Muntaniala. The goal of the project is to build an "overall low power" Muntaniala systolic system demonstrator using a low power microcontroller (e.g. PULP) or a low power FPGA. || ASIC/FPGA || HW (ASIC) / HW (FPGA) / SW (microcontr.) || [[:User:paulin|Gianna Paulin]]<br />
|-<br />
| reserved || MA/SA || Quantized Training of Recurrent Neural Networks || Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN. || GPU || SW (GPU) || [[:User:paulin|Gianna Paulin]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken || MA/SA || RISC-V LSTM Accelerator || LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| taken || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| taken || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Fabian Schuiki<br />
|-<br />
| taken || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| complete HS18|| 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch<br /><br />
[[:User:paulin|Gianna Paulin]], ETZ J 76.2, pauling@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4627Deep Learning Projects2019-03-13T11:49:30Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
|<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| taken || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Fabian Schuiki<br />
|-<br />
| taken || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| complete HS18|| 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4626Deep Learning Projects2019-03-13T11:48:24Z<p>Lukasc: /* On-Going Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| taken || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Fabian Schuiki<br />
|-<br />
| taken || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| complete HS18|| 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4625Deep Learning Projects2019-03-13T11:48:03Z<p>Lukasc: /* On-Going Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! Supervisors<br />
|-<br />
| taken || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli|]], Andres Gomez, Naomi Stricker (TIK)<br />
|-<br />
| taken || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken || 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]], Fabian Schuiki<br />
|-<br />
| taken || 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| complete HS18|| 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4624Deep Learning Projects2019-03-13T11:46:20Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || 1x SA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| taken || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken|| 2x SA|| TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken|| 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| complete HS18|| 1x MA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || 1x MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || 1x SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || 1x SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4623Deep Learning Projects2019-03-13T11:45:13Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken|| MA || TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken|| 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| complete HS18|| MA|| DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1x SA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4622Deep Learning Projects2019-03-13T11:44:12Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken|| MA || TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken|| 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
| completed HS18 || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| completed HS18 || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4621Deep Learning Projects2019-03-12T13:05:37Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
<!--| taken || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|- --><br />
| completed|| SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken|| MA || TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken|| 1x SA || Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4620Deep Learning Projects2019-03-12T13:04:44Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
<!--| taken || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|- --><br />
| completed|| SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken|| MA || TWN HW Accel. || INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system. || ASIC || HW (ASIC) & SW || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken|| 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Main_Page&diff=4619Main Page2019-03-12T12:55:27Z<p>Lukasc: /* Digital Circuits and Systems Group (Prof. Benini) */</p>
<hr />
<div>__NOTOC__<br />
<CENTER><H1> Welcome to IIS-Projects</H1></CENTER><br />
In this page you will find student and research projects at the [http://www.iis.ee.ethz.ch Integrated Systems Laboratory] of the [http://www.ethz.ch ETH Zurich].<br />
<br />
==Institute Organization==<br />
The IIS Consists of 4 main research groups<br />
* [[Analog| Analog and Mixed Signal Design]]<br />
* [[AnalogInt| Analog and Mixed Signal Interfaces]]<br />
* [[Digital| Digital Circuits and Systems]]<br />
* [[:Category:Nano-TCAD|Nano-TCAD]]<br />
<br />
===[[Analog|Analog and Mixed Signal Design Group (Prof. Huang)]]===<br />
* [[Analog IC Design]]<br />
* [[Biomedical System on Chips]]<br />
* [[RF SoCs for the Internet of Things]]<br />
* [[High-Performance & V2X Cellular Communications]]<br />
<br />
===[[AnalogInt| Analog and Mixed Signal Interfaces Group (Prof. Jang)]]===<br />
<DynamicPageList><br />
category = AnalogInt<br />
category = Available<br />
category = Hot<br />
</DynamicPageList><br />
<br />
===[[Digital|Digital Circuits and Systems Group (Prof. Benini)]]===<br />
* [[Computer Architecture]]<br />
* [[Acceleration and Transprecision]]<br />
* [[Heterogeneous Acceleration Systems]]<br />
* [[Event-Driven Computing]]<br />
* [[Predictable Execution]]<br />
* [[Low Power Embedded Systems and Wireless Sensors Networks]]<br />
* [[Embedded Artificial Intelligence:Systems And Applications]]<br />
* [[Students' International Competitions: F1(AMZ), Swissloop, Educational Rockets]]<br />
* [[Transient Computing]]<br />
* [[RF SoCs for the Internet of Things]]<br />
* [[Energy Efficient Autonomous UAVs]]<br />
* [[Biomedical System on Chips]]<br />
* [[Digital Medical Ultrasound Imaging]]<br />
* [[Cryptography|Cryptographic Hardware]]<br />
* [[Deep Learning Projects|Deep Learning Acceleration]]<br />
* [[Human Intranet]]<br />
<br />
===[[:Category:Nano-TCAD|Nano-TCAD Group (Prof. Luisier)]]===<br />
<DynamicPageList><br />
category = Nano-TCAD<br />
category = Available<br />
category = Hot<br />
</DynamicPageList><br />
<br />
===[[:Category:Collaboration|Collaborations with other groups/departments]]===<br />
<DynamicPageList><br />
category = Collaboration<br />
category = Available<br />
</DynamicPageList><br />
<br />
==Selected Projects in Progress==<br />
''For a complete list, see [[:Category:In progress|Projects in Progress]].''<br />
<DynamicPageList><br />
count = 5<br />
category = In progress<br />
</DynamicPageList><br />
<br />
==Selected Completed Projects==<br />
''For a complete list, see [[:Category:Completed|Completed Projects]].''<br />
<DynamicPageList><br />
count = 5<br />
category = Completed<br />
</DynamicPageList><br />
<br />
==Selected Research Projects==<br />
''For a complete list, see [[:Category:Research|Research Projects]].''<br />
<DynamicPageList><br />
count = 5<br />
category = Completed<br />
</DynamicPageList><br />
<br />
==Links to Other IIS Webpages==<br />
; [http://www.iis.ee.ethz.ch http://www.iis.ee.ethz.ch] <br />
: Integrated Systems Laboratory Main homepage<br />
; [http://www.nano-tcad.ethz.ch http://www.nano-tcad.ethz.ch] <br />
:Nano-TCAD group homepage<br />
; [http://www.dz.ee.ethz.ch http://www.dz.ee.ethz.ch]<br />
: Microelectronics Design Center<br />
; [http://asic.ethz.ch/cg http://asic.ethz.ch/cg]<br />
: The IIS-ASIC Chip Gallery<br />
; [http://eda.ee.ethz.ch http://eda.ee.ethz.ch]<br />
: EDA Wiki (''ETH Zurich internal access only!'')</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4583Deep Learning Projects2019-02-14T16:08:33Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailed description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|<br />
|-<br />
| taken || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| taken || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4480Deep Learning Projects2018-12-12T12:05:50Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder on an ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4479Deep Learning Projects2018-12-12T12:04:52Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4478Deep Learning Projects2018-12-12T11:35:27Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4462Deep Learning Projects2018-12-07T11:40:00Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4461Deep Learning Projects2018-12-07T11:38:28Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be employed on FPGA to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4460Deep Learning Projects2018-12-07T11:37:17Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC/FPGA || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be employed on FPGA to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || FPGA/Zynq || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the network, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4459Deep Learning Projects2018-12-07T11:36:13Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme ([https://arxiv.org/pdf/1810.03979.pdf paper]) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be employed on FPGA to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || ASIC || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the network, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4458Deep Learning Projects2018-12-07T11:34:59Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme [https://arxiv.org/pdf/1810.03979.pdf paper] which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be employed on FPGA to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || ASIC || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the network, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4457Deep Learning Projects2018-12-07T11:25:42Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme [https://arxiv.org/pdf/1810.03979.pdf paper] which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be employed on FPGA to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || ASIC || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually to complete data analysis on the node and transmit the result (e.g. a label instead of an image), but the node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the network, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4456Deep Learning Projects2018-12-07T11:25:26Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme [https://arxiv.org/pdf/1810.03979.pdf paper] which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be employed on FPGA to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || ASIC || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA ||Data Bottlenecks in DNNs || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually to complete data analysis on the node and transmit the result (e.g. a label instead of an image), but the node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the network, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4455Deep Learning Projects2018-12-07T11:25:03Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme [https://arxiv.org/pdf/1810.03979.pdf paper] which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be employed on FPGA to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || ASIC || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA || Distributed Data Analysis with Centralized Backend || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually to complete data analysis on the node and transmit the result (e.g. a label instead of an image), but the node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the network, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform. || Workstation || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4454Deep Learning Projects2018-12-07T11:23:58Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme [https://arxiv.org/pdf/1810.03979.pdf paper] which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA or 1x MA || Ternary-Weight FPGA System || Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be employed on FPGA to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection. || ASIC || HW & SW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || 1-2x SA || Distributed Data Analysis with Centralized Backend || In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually to complete data analysis on the node and transmit the result (e.g. a label instead of an image), but the node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the network, where you will be using a not-yet-published quantization method. || ASIC || SW (algorithm evals) || [[:User:lukasc|Lukas Cavigelli]], Matteo Spallanzani<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4453Deep Learning Projects2018-12-07T10:31:42Z<p>Lukasc: </p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme [https://arxiv.org/pdf/1810.03979.pdf paper] which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukaschttp://iis-projects.ee.ethz.ch/index.php?title=Deep_Learning_Projects&diff=4452Deep Learning Projects2018-12-07T10:31:11Z<p>Lukasc: /* Available Projects */</p>
<hr />
<div>We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and '''come talk to us'''. <br />
<br />
==Prerequisites==<br />
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills. <br />
<br />
Only hard requirements: <br />
* Excitement for deep learning <br />
* For VLSI projects: VLSI 1 or equivalent<br />
<br />
<!--- <span style="color:red">We are currently out of working spaces at IIS until around Easter 2018. Please contact us 1-2 months before the desired project start!</span> ---><br />
<br />
==Available Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| available || SA/MA || Stand-Alone Edge Computing with GAP8 || Detailled description: [[Stand-Alone_Edge_Computing_with_GAP8]] || Embedded || SW/HW (PCB-level) || [[:User:andrire|Renzo Andri]] Andres Gomez<br />
|-<br />
| available || MA || INQ Accelerator || INQ is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of +-2^n. As multiplcations with power's of two can be done by just shifting the bits, it is perfect for HW acceleration. In this thesis you will design an ASIC performing INQ quantized networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || MA/SA || On-chip Learning || Neural Networks are compute and resource intensive and are usually run on power-intensive GPU clusters, but we would like to exploit them also on the everywhere IoT devices. To reach that, we need to develop new hardware architecture optimized for this application. This also include to check new algorithmic approach, which can reduce the compute or memory footprint of these networks. || ASIC || HW (ASIC) || [[:User:andrire|Renzo Andri]]<br />
|-<br />
| available || 1-2x SA || HW Data Compressor for CNNs || The most commonly used hardware accelerators for CNNs are largely limited (energy efficiency, throughput) by the bandwidth to external DRAM. We have recently proposed a novel compression scheme [paper](https://arxiv.org/pdf/1810.03979.pdf) which would be a very good fit for a hardware implementation. In this project, you will implement the encoder and decoder for ASIC and/or FPGA, such that we can use it and verify that our claim of hardware suitability truly holds. || ASIC || HW (ASIC) / HW (FPGA) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
<!--NOTES LUKAS: MRA-based CNNs, finding a lighting/season independent image representation, sensor-fusion, Action Understanding in Video Data --><br />
<br />
Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)<br />
<br />
<br />
<!--<br />
|-<br />
| available || MA/SA || Self-Learning Drone || Autonomous Driving is a hot topic nowadays, but also self-learning approaches (i.e. re-inforcement learning) have had a big success (e.g. AlphaGo from Google beat the world champion in Go. We want a drone to learn from its environment such that the drone is able to solve a task independantly. || ML frameworks (e.g. Torch)/GPU, Drone Simulation (ROS/Gazebo) || SW (Training) || [[:User:andrire|Renzo Andri]], [[:User:dpalossi|Daniele Palossi]]<br />
|-<br />
| available || SA/MA || Weather Invariant Representations || When running computer vision applications, their performance under lab conditions often significantly differs from what you using real-world data. One main aspect is that often lighting conditions are normalized. Your target is to train a CNN to normalize weather conditions and going through the entire flow of collecting a dataset, training a CNN, and evaluating it. This type of problem can likely be approach with unsupervised or semi-supervised methods. || Workstation|| SW (algo evals, data acq.) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| available || MA || Distributed/Federated Learning || With the increasing number of IoT devices equipped with a bunch of sensor, it is not feasible to always stream all the data back to a server. Therefore, there is the need to learn on the node itself and synchronize/merge the network in a periodic scheme. || Embedded GPU || SW(algo, evals) || [[:User:andrire|Renzo Andri]], [[:User:lukasc|Lukas Cavigelli]]<br />
<br />
--><br />
<br />
==On-Going Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| taken || SA || SAR Data Analysis || We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. [https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7827114 paper]) or semi-/unsupervised learning to segment these images using very few labeled data. || Workstation|| SW (algo evals) || [[:User:xiaywang|Xiaying Wang]], [[:User:lukasc|Lukas Cavigelli]], [[:User:magnom|Michele Magno]]<br />
|-<br />
| taken || MA/2x SA || DNN Training Accelerator || The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. [https://arxiv.org/pdf/1707.04585.pdf paper]) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited. || ASIC || HW (ASIC) || [[:User:lukasc|Lukas Cavigelli]]<br />
|}<br />
<br />
==Completed Projects==<br />
{| class="wikitable"<br />
|-<br />
! Status !! Type !! Project Name !! Description !! Platform !! Workload Type !! First Contact(s)<br />
|-<br />
| completed FS18 || SA || CBinfer for Speech Recognition || We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. [https://arxiv.org/pdf/1704.04313.pdf paper]). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited. || Embedded GPU (Tegra X2) || SW (GPU, algo evals) || [[:User:lukasc|Lukas Cavigelli]]<br />
|-<br />
| completed HS18 || MA || One-shot/Few-shot Learning || One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [[https://arxiv.org/abs/1703.05175 paper], [https://github.com/jakesnell/prototypical-networks code]]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time. || Embedded GPU or Microcontroller || SW (algo, uC) || [[:User:lukasc|Lukas Cavigelli]], [[:User:andrire|Renzo Andri]]<br />
|-<br />
|}<br />
==Where to find us==<br />
[[:User:andrire|Renzo Andri]], ETZ J 76.2, andrire@iis.ee.ethz.ch<br /><br />
[[:User:lukasc|Lukas Cavigelli]], ETZ J 76.2, cavigelli@iis.ee.ethz.ch</div>Lukasc