http://iis-projects.ee.ethz.ch/api.php?action=feedcontributions&user=Smach&feedformat=atomiis-projects - User contributions [en]2024-03-28T22:46:56ZUser contributionsMediaWiki 1.28.0http://iis-projects.ee.ethz.ch/index.php?title=Hardware_Acceleration&diff=6085Hardware Acceleration2020-11-16T15:29:06Z<p>Smach: /* Computational Units */</p>
<hr />
<div>[[File:NVIDIA Tesla V100.jpg|thumb|right|A NVIDIA Tesla V100 GP-GPU. This cutting-edge accelerator provides huge computational power on a [https://arstechnica.com/gadgets/2017/05/nvidia-tesla-v100-gpu-details/ massive 800 mm² die].]]<br />
[[File:Google Cloud TPU.png|thumb|right|Google's Cloud TPU (Tensor Processing Unit). This machine learning accelerator can do one thing extremely well: multiply-accumulate operations.]]<br />
<br />
Accelerators are the backbone of big data and scientific computing. While general-purpose processor architectures such as Intel's x86 provide good performance across a wide variety of applications, it is only since the advent of general-purpose GPUs that many computationally demanding tasks have become feasible. Since these GPUs support a much narrower set of operations, it is easier to optimize the architecture to make them more efficient. Such accelerators are not limited to the high-performance sector alone. In low power computing, they allow complex tasks such as computer vision or cryptography to be performed under a very tight power budget. Without a dedicated accelerator, these tasks would not be feasible.<br />
<br />
==General-Purpose Computing==<br />
TBA<br />
<br />
===[[:User:Nwistoff | Nils Wistoff]]===<br />
* '''e-mail''': [mailto:nwistoff@iis.ee.ethz.ch nwistoff@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 06 75<br />
* '''office''': ETZ J85<br />
<br />
===[[:User:Paulsc | Paul Scheffler]]===<br />
* '''e-mail''': [mailto:paulsc@iis.ee.ethz.ch paulsc@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 09 15<br />
* '''office''': ETZ J85<br />
<br />
===Manuel Eggimann===<br />
* [mailto:meggiman@iis.ee.ethz.ch meggiman@iis.ee.ethz.ch]<br />
* ETZ J68<br />
<br />
===Fabian Schuiki===<br />
* [mailto:fschuiki@iis.ee.ethz.ch fschuiki@iis.ee.ethz.ch]<br />
* ETZ J89<br />
<br />
==Computational Units==<br />
The last decade has seen explosive growth in the quest for energy-efficient architectures and systems. An era of exponentially improving computing efficiency - driven mostly by CMOS technology scaling - is coming to an end as Moore’s law falters. The obstacle of the so-called thermal- or power-wall is fueling a push towards computing paradigms, which hold energy efficiency as the ultimate figure of merit for any hardware design.<br />
<br />
The broad term "computational units" covers a wide range of hardware accelerators for a multitude of different systems, such as floating-point units (FPUs) for processors, or dedicated accelerators for cryptography, signal processing, etc. Such computational units are housed within full systems which usually command stringent requirements in terms of performance, size, and efficiency.<br />
<br />
Key topics of interest are energy-efficient accelerators at various extremes of the design space, covering high-performance, ultra low-power, or minimum area implementations, as well as the exploration of novel paradigms in computing, arithmetics, and processor architectures.<br />
<br />
<br />
====Luca Bertaccini====<br />
* [mailto:lbertaccini@iis.ee.ethz.ch lbertaccini@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Matteo Perotti====<br />
* [mailto:mperotti@iis.ee.ethz.ch mperotti@iis.ee.ethz.ch]<br />
* ETZ J78<br />
<br />
====Stefan Mach====<br />
* [mailto:smach@iis.ee.ethz.ch smach@iis.ee.ethz.ch]<br />
* ETZ J89<br />
<br />
==Hardware Acceleration of DNNs and QNNs==<br />
Deep Learning (DL) and Artificial Intelligence (AI) are quickly becoming dominant paradigms for all kinds of analytics, complementing or replacing traditional data science methods. Successful at-scale deployment of these algorithms requires deploying them directly at the data source, i.e. in the IoT end-nodes collecting data. However, due to the extreme constraints of these devices (in terms of power, memory footprint, area cost), performing full DL inference in-situ in low-power end-nodes requires a breakthrough in computational performance and efficiency.<br />
It is widely known that the numerical representation typically used when developing DL algorithms (single-precision floating-point) encodes a higher precision than what is actually required to achieve high quality-of-results in inference (Courbariaux et al. 2016); this fact can be exploited in the design of energy-efficient hardware for DL.<br />
For example, by using ternary weights, which means all network weights are quantized to {-1,0,1}, we can design the fundamental compute units in hardware without using an HW-expensive multiplication unit. Additionally, it allows us to store the weights much more compact on-chip.<br />
<br />
{|<br />
| style="padding: 10px" | [[File:gianna.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Paulin| Gianna Paulin]]===<br />
* '''e-mail''': [mailto:pauling@iis.ee.ethz.ch pauling@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 60 80<br />
* '''office''': ETZ J76.2<br />
|}<br />
{|<br />
| style="padding: 64px" |<br />
|<br />
===Georg Rutishauser===<br />
* '''e-mail''': [mailto:georgr@iis.ee.ethz.ch georgr@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 54 97<br />
* '''office''': ETZ J68.2<br />
|}<br />
{|<br />
| style="padding: 10px" | [[File:Moritz_scherer.jpg|frameless|left|96px]]<br />
|<br />
===[[:User:Scheremo| Moritz Scherer]]===<br />
* '''e-mail''': [mailto:scheremo@iis.ee.ethz.ch scheremo@iis.ee.ethz.ch]<br />
* '''phone''': +41 44 632 77 86<br />
* '''office''': ETZ J69.2<br />
|}<br />
<br />
<br />
==Projects Overview==<br />
===Available Projects===<br />
<DynamicPageList><br />
category = Available<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Projects In Progress===<br />
<DynamicPageList><br />
category = In progress<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList><br />
<br />
===Completed Projects===<br />
[[File:Selene.jpg|thumb|right|The Logarithmic Number Unit chip [http://asic.ethz.ch/2014/Selene.html Selene].]]<br />
<DynamicPageList><br />
category = Completed<br />
category = Digital<br />
category = Acceleration and Transprecision<br />
suppresserrors=true<br />
</DynamicPageList></div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Extend_the_RI5CY_core_with_priviledge_extensions&diff=5453Extend the RI5CY core with priviledge extensions2020-09-15T13:21:17Z<p>Smach: /* Professor */</p>
<hr />
<div>==Introduction==<br />
<br />
The RI5CY core is one of the most famous open-source microprocessors that implements the RISC-V ISA. It has been designed for small embedded system platforms mostly used in IoT devices. Its ISA implements RISC-V's RV32IMFC plus custom instructions that have been designed to efficiently deal with digital-signal-processing applications typical for near-sensor systems.<br />
<br />
Recently RI5CY as well as the PULP platforms have been chosen and/or evaluated by big companies like Google, IBM, micron, NXP, Dolphin Integration, GreenWaves Technology etc.<br />
<br />
In particular, Google has evaluated the RI5CY core for being integrated in the Pixel Visual Core and they showed how the verification has been done for its evaluation: <br />
<br />
https://content.riscv.org/wp-content/uploads/2018/05/13.15-13.30-matt-Cockrell.pdf<br />
<br />
To be cost- and area efficient, RI5CY only implements a subset of the priviledged ISA. The subset however allows low-complex operating systems such as FreeRTOS to run on PULP platforms. However, the ever increase demanding of security on embedded systems make it necessary to run more complex operating systems. Those typically require more than one priviledge level.<br />
<br />
For instance, tiny operating sytems with minimalistic security need at least machine (M) and user (U) mode to be implemented in the core, whereas more complex OS like Linux require at least 3 levels, M, U and supervisor mode (S). MMU or MPU are also needed by these systems to filter bad memory requests according to the current priviledge level.<br />
<br />
==Project description==<br />
<br />
Furthermore, the core has to be able to interact with different priviledged interrupt requests. The student willing to join the PULP team and work with our core is required to:<br />
<br />
1. Take confidence with the current core architecture, understading the pipeline of the core and its functionality. This is achieved by studying its code, fix tiny problems to get confidence with our git enviroment, build a testbench for the core to test the IP isolated by the rest of the system. (~3-4 weeks)<br />
<br />
2. Change the pipeline of the core to support the U-priviledge mode. Parts of it are already in place. The student will focus especially on the memory exeptions and in the design of an MMU. The student is required to extend the testbench to emulated the security context, different interrupts requests etc. <br />
'''The verification will be one of the most imporant parts of the thesis''' (~6-9 weeks)<br />
<br />
3. Extend the PULPissimo platform and extend the current interrupt controller to support multilevel (2 levels) interrupts requests. This part may slightly change and has to be discussed during the thesis. (~3 weeks)<br />
<br />
4. Evaluate the speed, area and power overhead of the PULPissimo platform with security support compared to the version without the priviledge support in 65nm. (~3 weeks) If the student is fast and he wants to have extra fun, an FPGA implementation of the PULPissimo platform can be done to show small demo and the functionality of work .<br />
This thesis can also be taken by 2 students for a semester thesis.<br />
<br />
'''If the outcome of the thesis is valid and well performed, the student will have the possibility to attend and/or present his/her work to the RISC-V Workshop!<br />
'''<br />
<br />
==Required Skills==<br />
To work on this project, you will need:<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 / VLSI2 courses is recommended<br />
* to have prior knowedge of hardware design and computer architecture - having followed the Advances System-on-Chip Design course is recommended<br />
* to be strongly motivated for a difficult but super-cool project<br />
* to be able to work in a team<br />
<br />
===Status: Available ===<br />
<!-- Sharan Kumaar Ganesan (KTH Stockholm) --><br />
: Supervision: [[:User:Pschiavo | Pasquale Davide Schiavone]]<br />
<!-- : Date: 5/2018 --><br />
[[Category:Digital]] [[Category:PULP]] [[Category:ASIC]] [[Category:Available]] [[Category:Semester Thesis]] [[Category:Master Thesis]] [[Category:Hot]] [[Category:2018]]<br />
<br />
===Professor===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Pschiavo]]<br />
[[Category:ASIC]]<br />
[[Category:PULP]]<br />
[[Category:Processor]]<br />
[[Category:Computer Architecture]]<br />
<br />
Supervisors: Davide Schiavone</div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Extend_the_RI5CY_core_with_priviledge_extensions&diff=5452Extend the RI5CY core with priviledge extensions2020-09-15T13:17:16Z<p>Smach: </p>
<hr />
<div>==Introduction==<br />
<br />
The RI5CY core is one of the most famous open-source microprocessors that implements the RISC-V ISA. It has been designed for small embedded system platforms mostly used in IoT devices. Its ISA implements RISC-V's RV32IMFC plus custom instructions that have been designed to efficiently deal with digital-signal-processing applications typical for near-sensor systems.<br />
<br />
Recently RI5CY as well as the PULP platforms have been chosen and/or evaluated by big companies like Google, IBM, micron, NXP, Dolphin Integration, GreenWaves Technology etc.<br />
<br />
In particular, Google has evaluated the RI5CY core for being integrated in the Pixel Visual Core and they showed how the verification has been done for its evaluation: <br />
<br />
https://content.riscv.org/wp-content/uploads/2018/05/13.15-13.30-matt-Cockrell.pdf<br />
<br />
To be cost- and area efficient, RI5CY only implements a subset of the priviledged ISA. The subset however allows low-complex operating systems such as FreeRTOS to run on PULP platforms. However, the ever increase demanding of security on embedded systems make it necessary to run more complex operating systems. Those typically require more than one priviledge level.<br />
<br />
For instance, tiny operating sytems with minimalistic security need at least machine (M) and user (U) mode to be implemented in the core, whereas more complex OS like Linux require at least 3 levels, M, U and supervisor mode (S). MMU or MPU are also needed by these systems to filter bad memory requests according to the current priviledge level.<br />
<br />
==Project description==<br />
<br />
Furthermore, the core has to be able to interact with different priviledged interrupt requests. The student willing to join the PULP team and work with our core is required to:<br />
<br />
1. Take confidence with the current core architecture, understading the pipeline of the core and its functionality. This is achieved by studying its code, fix tiny problems to get confidence with our git enviroment, build a testbench for the core to test the IP isolated by the rest of the system. (~3-4 weeks)<br />
<br />
2. Change the pipeline of the core to support the U-priviledge mode. Parts of it are already in place. The student will focus especially on the memory exeptions and in the design of an MMU. The student is required to extend the testbench to emulated the security context, different interrupts requests etc. <br />
'''The verification will be one of the most imporant parts of the thesis''' (~6-9 weeks)<br />
<br />
3. Extend the PULPissimo platform and extend the current interrupt controller to support multilevel (2 levels) interrupts requests. This part may slightly change and has to be discussed during the thesis. (~3 weeks)<br />
<br />
4. Evaluate the speed, area and power overhead of the PULPissimo platform with security support compared to the version without the priviledge support in 65nm. (~3 weeks) If the student is fast and he wants to have extra fun, an FPGA implementation of the PULPissimo platform can be done to show small demo and the functionality of work .<br />
This thesis can also be taken by 2 students for a semester thesis.<br />
<br />
'''If the outcome of the thesis is valid and well performed, the student will have the possibility to attend and/or present his/her work to the RISC-V Workshop!<br />
'''<br />
<br />
==Required Skills==<br />
To work on this project, you will need:<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 / VLSI2 courses is recommended<br />
* to have prior knowedge of hardware design and computer architecture - having followed the Advances System-on-Chip Design course is recommended<br />
* to be strongly motivated for a difficult but super-cool project<br />
* to be able to work in a team<br />
<br />
===Status: Available ===<br />
<!-- Sharan Kumaar Ganesan (KTH Stockholm) --><br />
: Supervision: [[:User:Pschiavo | Pasquale Davide Schiavone]]<br />
<!-- : Date: 5/2018 --><br />
[[Category:Digital]] [[Category:PULP]] [[Category:ASIC]] [[Category:Available]] [[Category:Semester Thesis]] [[Category:Master Thesis]] [[Category:Hot]] [[Category:2018]]<br />
<br />
===Professor===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Pschiavo]]<br />
[[Category:ASIC]]<br />
[[Category:PULP]]<br />
[[Category:Processor]]<br />
[[Category:Computer Architecture]]<br />
<br />
Supervisors: Davide Schiavone, Stefan Mach</div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Cerebellum:_Design_of_a_Programmable_Smart-Peripheral_for_the_Ariane_Core&diff=5451Cerebellum: Design of a Programmable Smart-Peripheral for the Ariane Core2020-09-15T13:16:38Z<p>Smach: </p>
<hr />
<div>==Introduction==<br />
With the growth of smart sensors being part of everyone’s everyday life, data driven applications are acquiring more and more relevance in the electronics consumer market. Smartwatches for fitness tracking, camera for security and multimedia entertaining as well as biomedical devices as ECG and EEG wearable devices for health care applications are just few of these examples.<br />
Typically, the data streams coming from sensors are processed on servers in the cloud.<br />
This requires the data to be sensed by a physical device driven by a microcontroller, possibly pre-processed and eventually sent to the network wirelessly (as using Bluetooth low power WiFi radios) where the packet goes through router and switches until it finally arrives to the server in the cloud which will process it and possibly give feedbacks to the users or to the microcontroller for closed-loop applications.<br />
As these smart-sensors are usually battery-powered, they are designed to be energy efficient. Most of the power is spent in transmitting the data from the radio to the server, therefore minimizing the transmitted bandwidth towards the servers does not only help to minimize the traffic and congestions, but it also helps the smart-sensors to live longer.<br />
<br />
[[File:Pulp_processing_iot.png|thumb|400px]]<br />
<br />
Classification and/or data compression are data processing algorithms that can be used to cope with the aforementioned challenge. As for example, one can imagine an application for face recognition built as following: an ultra-low-power camera continuously acquires images, the microcontroller can compress the image and send less bytes to the server which will simply decompress the data to perform a convolutional neural network to classify the acquired face.<br />
Another smarter example still built on a face recognition application is the following: the microcontroller performs a pre-classification on the image to recognize whether the picture is a face or not. In this case, only a small part of the algorithm is needed with respect the whole face recognition process. If the picture is a face, the image is then sent to the cloud saving both on-node power due to the limited access to the radio device and server resources, as they now execute face recognitions algorithms only on certain events.<br />
<br />
[[File:Face recognition.png|thumb|400px]]<br />
<br />
The event-driven execution paradigma can be also applied at microscopic level by shutting down parts of the microcontroller which are not used during some sort of pre-processing and turning them up only for detected events.<br />
<br />
==Project description==<br />
<br />
<br />
'''Do you want to leave your contribution to the open-source community?<br />
<br />
In this thesis we propose to build the next OPEN-SOURCE RISC-V programmable smart-peripheral system for the Ariane Core.<br />
The event-based microcontroller based on our open-source IPs such as the RISC-V RV64GC Ariane core and the PULPissimo platform will then be released open-source together with his older brothers PULP, PULPissimo, PULPino, Ariane, BigPULP, etc.<br />
'''<br />
<br />
https://github.com/pulp-platform/ariane<br />
<br />
https://github.com/pulp-platform/pulpissimo<br />
<br />
As many applications are built using high-level languages such as Python or LUA, having a Linux-capable microcontroller as for example the Raspberry-Pi or the Ariane core makes the software portability and reusability easy. <br />
<br />
<br />
However, these microcontrollers usually run at high speed (above 1GHz) and consume non-negligible power for constrained applications. We therefore propose to connect the PULPissimo microcontroller based on our RISC-V RV32IMFC RISCY core to the Ariane coreplex for the heavy-processing and acquisition part of the application.<br />
PULPissimo has an autonomous and efficient I/O subsystem, a rich set of peripherals and is optimized for energy efficiency. The RI5CY core has been extended with custom instructions to target high energy efficiency when running digital signal processing functions. It can be attached to the Ariane subsystem via an AXI plug and mapped to the Linux physical memory-mapped device as a normal peripheral.<br />
<br />
The student tasks can be summarized as following:<br />
* Physically connect the PULPissimo microcontroller to the Ariane coreplex and map the whole system to the FPGA. Note that Ariane has already been mapped to the FPGA and it is able to boot Linux, the student can start for the already done work and extend it with PULPissimo.<br />
* Write the kernel driver for Linux to map a physical region of memory to control PULPissimo, which will be seen by the OS as a smart peripherals programmed by writing special words on special addresses.<br />
* Write or adapt common benchmarks like face detection on the whole system under the Linux environment.<br />
<br />
The point 3 will be implemented by exploiting the event-based paradigma as follow:<br />
The application running on Linux on the Ariane core will call a function like “acquire_and_detect” by using the PULPissimo smart peripheral. Thus the RISCY core will use the I/O subsystem to acquire the picture and run a face detection algorithm on it. If the image is indeed a face, the function will return true otherwise false. Meanwhile Ariane waits in sleep mode (saving power) for PULPissimo to accomplish the task, then if the image acquire represents indeed a face it will run, using advanced libraries liek TensorFlow lite [1], the whole face recognition task.<br />
<br />
<br />
[1] https://www.tensorflow.org/lite/<br />
<br />
==Required Skills==<br />
To work on this project, you will need:<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 / VLSI2 courses is recommended<br />
* to have prior knowedge of hardware design and computer architecture and FPGA physical design<br />
Other skills that you might find useful include:<br />
* familiarity with a scripting language for numerical simulation (Python or Matlab or Lua…)<br />
* to be strongly motivated for a difficult but super-cool project<br />
If you want to work on this project, but you think that you do not match some the required skills, we can give you some preliminary exercise to help you fill in the gap.<br />
<br />
===Status: Available ===<br />
<!-- Sharan Kumaar Ganesan (KTH Stockholm) --><br />
: Supervision: [[:User:Pschiavo | Pasquale Davide Schiavone]]<br />
<!-- : Date: 12/2018 --><br />
[[Category:Digital]] [[Category:PULP]] [[Category:ASIC]] [[Category:Available]] [[Category:Semester Thesis]] [[Category:Master Thesis]] [[Category:Hot]] [[Category:2017]]<br />
<br />
===Professor===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Pschiavo]]<br />
[[Category:ASIC]]<br />
[[Category:PULP]]<br />
[[Category:Processor]]<br />
[[Category:Computer Architecture]]</div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Implementation_of_the_RISC-V_Bit_Manipulation_(RVB)_extensions_for_our_RISC-V_core&diff=5450Implementation of the RISC-V Bit Manipulation (RVB) extensions for our RISC-V core2020-09-15T13:15:43Z<p>Smach: </p>
<hr />
<div>==Introduction==<br />
RISC-V is an open-source Instruction Set Architecture (ISA) governed by the no-profit organization RISC-V (https://riscv.org/).<br />
Thanks to its simplicity, efficiency and free to use nature, in the last years it has been heavily adopted by industries for products, evaluation board as well as by university for vehicle to research project. Interesting examples are products coming from companies like NXP with the Vega board (https://hackaday.com/2019/02/04/openisa-launches-free-risc-v-vegaboard/), GAP8 from GreenWaves (https://greenwaves-technologies.com/ai_processor_gap8/), SiFive core IPs (https://www.sifive.com/risc-v-core-ip) or Dolphin Integration with the Tornado board (https://www.design-reuse.com/news/44159/dolphin-integration-risc-v-subsystem.html) and many more. ETH and in particular our Digital Circuits and Systems group at IIS contributed to the open-source RISC-V community by providing three cores RI5CY, zero-riscy and Ariane that have been also used in aforementioned products. Moreover, they have been recently graduated from academic-level IP quality to industry with the support of big companies like Google, SiLabs, NXP, etc.<br />
One of the key success point of RISC-V along its free and open architecture is its extendability. <br />
Among the official instruction extensions, some of the ISA encoding space is left to implement custom instructions that face custom optimization that vary from context to context. Under the PULP project here in our group (https://pulp-platform.org/), the RI5CY has been extended with custom instructions to support signal processing, bit manipulation tasks, hardware-loops and so on. <br />
Some of the extensions are also organized inside RISC-V to be discussed in task-groups among members of the foundation. For instance, there are task groups to specify Vector extensions (RVV), Packed SIMD extensions (RVP), Bit Manipulation (RVB) etc. <br />
Recently, the RISC-V community has proposed a quite stable proposal for the RVB instructions https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-draft.pdf. <br />
Some of those are indeed very similar to the one developed for the PULP project and some are not. Some are missing and some are not present part of the PULP extensions.<br />
<br />
==Project description==<br />
<br />
We propose a semester thesis to implement the proposed RVB extensions to the RI5CY core. Such tasks requires to:<br />
<br />
** Replace the ones that are similar/equal<br />
** Implement the missing ones<br />
** Evaluate the impact in AREA and Timing with a detailed report<br />
** Evaluate the impact in performance/execution time for a given set of benchmarks <br />
<br />
In case of a master thesis, this work will be further investigated for the 64 bit RISC-V core Ariane, plus enhancement on the RI5CY verification strategy and execution trace.<br />
<br />
==Required Skills==<br />
To work on this project, you will need:<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 class or equivalent is mandatory, VLSI2 course recommended<br />
* to have prior knowledge in assembler/C program language<br />
<br />
Other skills that you might find useful include:<br />
* to be strongly motivated for a difficult but super-cool project<br />
If you want to work on this project, but you think that you do not match some the required skills, we can give you some preliminary exercise to help you fill in the gap.<br />
<br />
===Status: Available ===<br />
<!-- Sharan Kumaar Ganesan (KTH Stockholm) --><br />
: Supervision: [[:User:Pschiavo | Pasquale Davide Schiavone]], [[:User:Balasar | Robert Balas]]<br />
<!-- : Date: 12/2018 --><br />
[[Category:Digital]] [[Category:PULP]] [[Category:ASIC]] [[Category:Available]] [[Category:Semester Thesis]] [[Category:Master Thesis]] [[Category:Hot]] [[Category:2017]]<br />
<br />
===Professor===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
<br />
[[Category:Digital]]<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:Available]]<br />
[[Category:Pschiavo]]<br />
[[Category:ASIC]]<br />
[[Category:PULP]]<br />
[[Category:Processor]]<br />
[[Category:Computer Architecture]]</div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Floating-Point_Divide_%26_Square_Root_Unit_for_Transprecision&diff=5329Floating-Point Divide & Square Root Unit for Transprecision2020-07-29T15:09:45Z<p>Smach: </p>
<hr />
<div>==Introduction==<br />
Traditional computing systems operate in a precise, "every bit correct" manner. When it comes to computing workloads using arithmetic for example, most programmers tend carry out all computations at the maximum available precision - double precision in the case of floating-point (FP)workloads. This happens regardless of the actual numerical requirements of the application at hand. <br />
Transprecision computing aims at doing away with this rigid way of computing by adding more 'knobs' in hardware and software that can be used to adjust computing precision on the fly. In contrast to approximate computing where the precision of the entire system is reduced - often incurring loss in result quality - transprecision computing entails dynamically providing the precision needed for a correct execution.<br />
In floating-point arithmetic, this gain in energy efficiency and speed can be achieved by using custom precision fromats that use fewer bits than the standard 'single' and 'double' precision, leading to smaller and more efficient hardware. Using reduced precision floating-point arithmetic is interesting for classic video and audio processing, but also machine learning and scientific computing workloads.<br />
<br />
At the Integrated Systems Laboratory (IIS) we have been working for several years on ultra-low-power processor cores in the context of the ''PULP'' (Parallel Ultra-Low Power) project. PULP cores implement the open-source RISC-V instruction set archictechture (ISA), which includes FP instructions as optional ISA extensions.<br />
RISC-V allows for custom ISA extensions which were used to define transprecision floating-point extensions for the use in PULP cores. Our extensions define 16-bit and 8-bit FP formats, and various operations on these formats, including single-instruction-multiple-data (SIMD) vectors.<br />
<br />
==Project description==<br />
<!--[[File:ergo_archi.png|thumb|400px]]--><br />
<br />
In order for FP arithmetic being fast and energy-efficient in a processor core, a dedicated floating-point unit (FPU) in hardware is needed. RISC-V defines basic arithmetic floating-point operations such as addition, multiplication and division. While addition and multiplication in hardware are quite straight-forward, implementing a division unit is more tricky. There are several architectural options, each with different trade-offs in terms of area, power and latency.<br />
<br />
The basic arithmetic operations are also needed for our custom transprecision formats in PULP. Today, there is a transprecision-enabled FPU in PULP that offers support for nearly all operations needed to comply with the RISC-V specifications as well as our own extensions. However, the division and sqare-root (DIV/SQRT) unit is currently non-parametrizable and not fully compliant with the IEEE standard for FP arithmetic.<br />
For the purpose of enabling diverse PULP-based systems with various different features such as 32/64-bit cores with and without transprecision capabilities, a flexible and parametric FP DIV/SQRT unit with multi-format capablilities is needed.<br />
<br />
In this project, you will evaluate different algorithms for a DIV/SQRT unit in hardware, and implement the most promising one into a transprecision-capable unit. Furthermore, you will take your design through most of the steps necessary for manufacturing it on an actual IC to obtain accurate energy-efficiency and performance metrics. Also, you will be able to test your unit within one of our cores. This new unit will be used inside future versions of our cores to leverage its transprecision capabilities, as well as standard operations.<br />
<br />
==Outcomes and Acquired Expertise==<br />
With this project you will work in a field of active research to help developing a transprecision-enabled platform for ASIC and FPGA targets. You will learn:<br />
* about computer arithmetics first hand by diving into algorithms;<br />
* how to design a hardware module for integrating it within a more complex platform, using EDA tools for verification and RTL synthesis to evaluate results;<br />
* how to take a synthesized design through the back-end EDA flow to prepare it for manufacturing and obtaining power simulation measurements.<br />
<br />
==Required Skills==<br />
To work on this project, you will need:<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed (or actively following during the project) the VLSI1 / VLSI2 courses is strongly recommended<br />
* to have some prior knowedge of hardware design and architectures<br />
Other skills that you might find useful include:<br />
* familiarity with computer arithmetics<br />
<br />
If you want to work on this project, but you think that you do not match some the required skills, we can provide you with some preliminary exercise to help you fill in the gap.<br />
<br />
===Status: Completed===<br />
<!--Paul Scheffler, Luca Colagrande--><br />
: Supervision: [[:User:Smach | Stefan Mach]], [[:User:Paulin | Gianna Paulin]]<br />
<!--: Date: Spring 2019--><br />
[[Category:Digital]] [[Category:PULP]] [[Category:ASIC]] [[Category:Completed]] [[Category:Semester Thesis]] [[Category:Master Thesis]] [[Category:2019]] [[Category:Acceleration and Transprecision]][[Category:Paulin]][[Category:Smach]] <br />
<br />
===Professor===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
==Practical Details==<br />
<br />
===Meetings & Presentations===<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [http://eda.ee.ethz.ch/index.php/Design_review].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.<br />
<br />
* '''[[Project Plan]]'''<br />
* '''[[Project Meetings]]'''<br />
* '''[[Design Review]]'''<br />
* '''[[Coding Guidelines]]'''<br />
* '''[[Final Report]]'''<br />
* '''[[Final Presentation]]'''<br />
<br />
<!--<br />
==Literature==<br />
* [Andri2017] R. Andri et al., YodaNN: an Architecture for Ultra-Low Power Binary-Weight CNN Acceleration, [https://arxiv.org/pdf/1606.05487.pdf]<br />
* [Conti2017] F. Conti et al., An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics, [https://arxiv.org/pdf/1612.05974.pdf]<br />
* [Intel2017] A. Zhou et al., Incremental Network Quantization: Towards Lossless CNNs with low-precision weights, [https://arxiv.org/pdf/1702.03044.pdf]<br />
* [Krizhevsky2012] A. Khrizevsky et al., ImageNet Classification with Deep Convolutional Neural Networks, [http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf]<br />
--><br />
<br />
==Links==<br />
* The EDA wiki with lots of information on the ETHZ ASIC design flow (internal only) [http://eda.ee.ethz.ch/]<br />
* The IIS/DZ coding guidelines [http://eda.ee.ethz.ch/index.php/Naming_Conventions]<br />
<br />
<br />
[[#top|↑ top]]<br />
<br />
<!--<br />
<br />
COPY PASTE FROM THE LIST BELOW TO ADD TO CATEGORIES<br />
<br />
GROUP<br />
[[Category:Digital]]<br />
[[Category:Analog]]<br />
[[Category:Nano-TCAD]]<br />
[[Category:Nano Electronics]]<br />
<br />
STATUS<br />
[[Category:Available]]<br />
[[Category:In progress]]<br />
[[Category:Completed]]<br />
[[Category:Hot]]<br />
<br />
TYPE OF WORK<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:PhD Thesis]]<br />
[[Category:Research]]<br />
<br />
NAMES OF EU/CTI/NT PROJECTS<br />
[[Category:UltrasoundToGo]]<br />
[[Category:IcySoC]]<br />
[[Category:PSocrates]]<br />
[[Category:UlpSoC]]<br />
[[Category:Qcrypt]]<br />
<br />
YEAR (IF FINISHED)<br />
[[Category:2010]]<br />
[[Category:2011]]<br />
[[Category:2012]]<br />
[[Category:2013]]<br />
[[Category:2014]]<br />
<br />
---></div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Stefan_Mach&diff=5328Stefan Mach2020-07-29T15:06:13Z<p>Smach: /* Projects in Progress */</p>
<hr />
<div>__NOTOC__<br />
Stefan Mach received his M.Sc. degree from the Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland, where he is currently pursuing a Ph.D. degree. Since 2017, he has been a research assistant with the Integrated Systems Laboratory at ETHZ. <br />
<br />
==Interests==<br />
* Computer Architecure and Microprocessors<br />
* Digital ASIC Design<br />
* Transprecision/Approximate Computing<br />
* Hardware Support for Energy-Efficient Transprecision Computing<br />
* Embedded Systems<br />
* PCB/Systems Design<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Projects in Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In_progress<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Past Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Contact Information==<br />
* '''Office''': ETZ J 89<br />
* '''e-mail''': [mailto:smach@iis.ee.ethz.ch smach@iis.ee.ethz.ch]<br />
* '''phone''': (+41 44 63) 254 33<br />
<br />
[[Category:Supervisors]]<br />
[[Category:Digital]]</div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Final_Presentation&diff=5124Final Presentation2020-03-22T16:53:11Z<p>Smach: </p>
<hr />
<div>There will be a presentation (15 min (GS/SA) / 20 min (MA) presentation and 5 min Q&A) at the end of this project in order to present your results to a wider audience. The exact date will be determined towards the end of the work. It is important to make a good presentation, as most of the members of the IIS will learn about your project through your presentation. <br />
<br />
==Tips and Tricks==<br />
* Consider about 1 slide per minute on average for your slides. <br />
* It is easier to plan your slides using pen and paper first. Draw a grid of 10-20 slides and start placing the mandatory slides first (title, contents, the final slide), and fill in the rest.<br />
* You need to make sure that the audience understands your motivation for the project before you go into details. What are you proposing? and why is it important?<br />
* For the target audience, consider your own level prior to the project<br />
<br />
==Templates==<br />
* Word<br />
* [[Media:beamer_template.tar.gz|LateX template]]<br />
<br />
==Links==<br />
* [http://www.inf.ethz.ch/personal/markusp/teaching/guides/guide-presentations.pdf Small Guide to Giving Presentations] by Markus Püschel.<br />
<br />
[[Category:Information]]</div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Floating-Point_Divide_%26_Square_Root_Unit_for_Transprecision&diff=5040Floating-Point Divide & Square Root Unit for Transprecision2019-11-18T16:20:22Z<p>Smach: </p>
<hr />
<div>==Introduction==<br />
Traditional computing systems operate in a precise, "every bit correct" manner. When it comes to computing workloads using arithmetic for example, most programmers tend carry out all computations at the maximum available precision - double precision in the case of floating-point (FP)workloads. This happens regardless of the actual numerical requirements of the application at hand. <br />
Transprecision computing aims at doing away with this rigid way of computing by adding more 'knobs' in hardware and software that can be used to adjust computing precision on the fly. In contrast to approximate computing where the precision of the entire system is reduced - often incurring loss in result quality - transprecision computing entails dynamically providing the precision needed for a correct execution.<br />
In floating-point arithmetic, this gain in energy efficiency and speed can be achieved by using custom precision fromats that use fewer bits than the standard 'single' and 'double' precision, leading to smaller and more efficient hardware. Using reduced precision floating-point arithmetic is interesting for classic video and audio processing, but also machine learning and scientific computing workloads.<br />
<br />
At the Integrated Systems Laboratory (IIS) we have been working for several years on ultra-low-power processor cores in the context of the ''PULP'' (Parallel Ultra-Low Power) project. PULP cores implement the open-source RISC-V instruction set archictechture (ISA), which includes FP instructions as optional ISA extensions.<br />
RISC-V allows for custom ISA extensions which were used to define transprecision floating-point extensions for the use in PULP cores. Our extensions define 16-bit and 8-bit FP formats, and various operations on these formats, including single-instruction-multiple-data (SIMD) vectors.<br />
<br />
==Project description==<br />
<!--[[File:ergo_archi.png|thumb|400px]]--><br />
<br />
In order for FP arithmetic being fast and energy-efficient in a processor core, a dedicated floating-point unit (FPU) in hardware is needed. RISC-V defines basic arithmetic floating-point operations such as addition, multiplication and division. While addition and multiplication in hardware are quite straight-forward, implementing a division unit is more tricky. There are several architectural options, each with different trade-offs in terms of area, power and latency.<br />
<br />
The basic arithmetic operations are also needed for our custom transprecision formats in PULP. Today, there is a transprecision-enabled FPU in PULP that offers support for nearly all operations needed to comply with the RISC-V specifications as well as our own extensions. However, the division and sqare-root (DIV/SQRT) unit is currently non-parametrizable and not fully compliant with the IEEE standard for FP arithmetic.<br />
For the purpose of enabling diverse PULP-based systems with various different features such as 32/64-bit cores with and without transprecision capabilities, a flexible and parametric FP DIV/SQRT unit with multi-format capablilities is needed.<br />
<br />
In this project, you will evaluate different algorithms for a DIV/SQRT unit in hardware, and implement the most promising one into a transprecision-capable unit. Furthermore, you will take your design through most of the steps necessary for manufacturing it on an actual IC to obtain accurate energy-efficiency and performance metrics. Also, you will be able to test your unit within one of our cores. This new unit will be used inside future versions of our cores to leverage its transprecision capabilities, as well as standard operations.<br />
<br />
==Outcomes and Acquired Expertise==<br />
With this project you will work in a field of active research to help developing a transprecision-enabled platform for ASIC and FPGA targets. You will learn:<br />
* about computer arithmetics first hand by diving into algorithms;<br />
* how to design a hardware module for integrating it within a more complex platform, using EDA tools for verification and RTL synthesis to evaluate results;<br />
* how to take a synthesized design through the back-end EDA flow to prepare it for manufacturing and obtaining power simulation measurements.<br />
<br />
==Required Skills==<br />
To work on this project, you will need:<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed (or actively following during the project) the VLSI1 / VLSI2 courses is strongly recommended<br />
* to have some prior knowedge of hardware design and architectures<br />
Other skills that you might find useful include:<br />
* familiarity with computer arithmetics<br />
<br />
If you want to work on this project, but you think that you do not match some the required skills, we can provide you with some preliminary exercise to help you fill in the gap.<br />
<br />
===Status: In Progress===<br />
<!--Paul Scheffler, Luca Colagrande--><br />
: Supervision: [[:User:Smach | Stefan Mach]], [[:User:Paulin | Gianna Paulin]]<br />
<!--: Date: Spring 2019--><br />
[[Category:Digital]] [[Category:PULP]] [[Category:ASIC]] [[Category:In progress]] [[Category:Semester Thesis]] [[Category:Master Thesis]] [[Category:2019]] [[Category:Acceleration and Transprecision]][[Category:Paulin]] <br />
<br />
===Professor===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
==Practical Details==<br />
<br />
===Meetings & Presentations===<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [http://eda.ee.ethz.ch/index.php/Design_review].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.<br />
<br />
* '''[[Project Plan]]'''<br />
* '''[[Project Meetings]]'''<br />
* '''[[Design Review]]'''<br />
* '''[[Coding Guidelines]]'''<br />
* '''[[Final Report]]'''<br />
* '''[[Final Presentation]]'''<br />
<br />
<!--<br />
==Literature==<br />
* [Andri2017] R. Andri et al., YodaNN: an Architecture for Ultra-Low Power Binary-Weight CNN Acceleration, [https://arxiv.org/pdf/1606.05487.pdf]<br />
* [Conti2017] F. Conti et al., An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics, [https://arxiv.org/pdf/1612.05974.pdf]<br />
* [Intel2017] A. Zhou et al., Incremental Network Quantization: Towards Lossless CNNs with low-precision weights, [https://arxiv.org/pdf/1702.03044.pdf]<br />
* [Krizhevsky2012] A. Khrizevsky et al., ImageNet Classification with Deep Convolutional Neural Networks, [http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf]<br />
--><br />
<br />
==Links==<br />
* The EDA wiki with lots of information on the ETHZ ASIC design flow (internal only) [http://eda.ee.ethz.ch/]<br />
* The IIS/DZ coding guidelines [http://eda.ee.ethz.ch/index.php/Naming_Conventions]<br />
<br />
<br />
[[#top|↑ top]]<br />
<br />
<!--<br />
<br />
COPY PASTE FROM THE LIST BELOW TO ADD TO CATEGORIES<br />
<br />
GROUP<br />
[[Category:Digital]]<br />
[[Category:Analog]]<br />
[[Category:Nano-TCAD]]<br />
[[Category:Nano Electronics]]<br />
<br />
STATUS<br />
[[Category:Available]]<br />
[[Category:In progress]]<br />
[[Category:Completed]]<br />
[[Category:Hot]]<br />
<br />
TYPE OF WORK<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:PhD Thesis]]<br />
[[Category:Research]]<br />
<br />
NAMES OF EU/CTI/NT PROJECTS<br />
[[Category:UltrasoundToGo]]<br />
[[Category:IcySoC]]<br />
[[Category:PSocrates]]<br />
[[Category:UlpSoC]]<br />
[[Category:Qcrypt]]<br />
<br />
YEAR (IF FINISHED)<br />
[[Category:2010]]<br />
[[Category:2011]]<br />
[[Category:2012]]<br />
[[Category:2013]]<br />
[[Category:2014]]<br />
<br />
---></div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Floating-Point_Divide_%26_Square_Root_Unit_for_Transprecision&diff=4599Floating-Point Divide & Square Root Unit for Transprecision2019-02-19T11:22:41Z<p>Smach: Created page with "==Introduction== Traditional computing systems operate in a precise, "every bit correct" manner. When it comes to computing workloads using arithmetic for example, most progra..."</p>
<hr />
<div>==Introduction==<br />
Traditional computing systems operate in a precise, "every bit correct" manner. When it comes to computing workloads using arithmetic for example, most programmers tend carry out all computations at the maximum available precision - double precision in the case of floating-point (FP)workloads. This happens regardless of the actual numerical requirements of the application at hand. <br />
Transprecision computing aims at doing away with this rigid way of computing by adding more 'knobs' in hardware and software that can be used to adjust computing precision on the fly. In contrast to approximate computing where the precision of the entire system is reduced - often incurring loss in result quality - transprecision computing entails dynamically providing the precision needed for a correct execution.<br />
In floating-point arithmetic, this gain in energy efficiency and speed can be achieved by using custom precision fromats that use fewer bits than the standard 'single' and 'double' precision, leading to smaller and more efficient hardware. Using reduced precision floating-point arithmetic is interesting for classic video and audio processing, but also machine learning and scientific computing workloads.<br />
<br />
At the Integrated Systems Laboratory (IIS) we have been working for several years on ultra-low-power processor cores in the context of the ''PULP'' (Parallel Ultra-Low Power) project. PULP cores implement the open-source RISC-V instruction set archictechture (ISA), which includes FP instructions as optional ISA extensions.<br />
RISC-V allows for custom ISA extensions which were used to define transprecision floating-point extensions for the use in PULP cores. Our extensions define 16-bit and 8-bit FP formats, and various operations on these formats, including single-instruction-multiple-data (SIMD) vectors.<br />
<br />
==Project description==<br />
<!--[[File:ergo_archi.png|thumb|400px]]--><br />
<br />
In order for FP arithmetic being fast and energy-efficient in a processor core, a dedicated floating-point unit (FPU) in hardware is needed. RISC-V defines basic arithmetic floating-point operations such as addition, multiplication and division. While addition and multiplication in hardware are quite straight-forward, implementing a division unit is more tricky. There are several architectural options, each with different trade-offs in terms of area, power and latency.<br />
<br />
The basic arithmetic operations are also needed for our custom transprecision formats in PULP. Today, there is a transprecision-enabled FPU in PULP that offers support for nearly all operations needed to comply with the RISC-V specifications as well as our own extensions. However, the division and sqare-root (DIV/SQRT) unit is currently non-parametrizable and not fully compliant with the IEEE standard for FP arithmetic.<br />
For the purpose of enabling diverse PULP-based systems with various different features such as 32/64-bit cores with and without transprecision capabilities, a flexible and parametric FP DIV/SQRT unit with multi-format capablilities is needed.<br />
<br />
In this project, you will evaluate different algorithms for a DIV/SQRT unit in hardware, and implement the most promising one into a transprecision-capable unit. Furthermore, you will take your design through most of the steps necessary for manufacturing it on an actual IC to obtain accurate energy-efficiency and performance metrics. Also, you will be able to test your unit within one of our cores. This new unit will be used inside future versions of our cores to leverage its transprecision capabilities, as well as standard operations.<br />
<br />
==Outcomes and Acquired Expertise==<br />
With this project you will work in a field of active research to help developing a transprecision-enabled platform for ASIC and FPGA targets. You will learn:<br />
* about computer arithmetics first hand by diving into algorithms;<br />
* how to design a hardware module for integrating it within a more complex platform, using EDA tools for verification and RTL synthesis to evaluate results;<br />
* how to take a synthesized design through the back-end EDA flow to prepare it for manufacturing and obtaining power simulation measurements.<br />
<br />
==Required Skills==<br />
To work on this project, you will need:<br />
* to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed (or actively following during the project) the VLSI1 / VLSI2 courses is strongly recommended<br />
* to have some prior knowedge of hardware design and architectures<br />
Other skills that you might find useful include:<br />
* familiarity with computer arithmetics<br />
<br />
If you want to work on this project, but you think that you do not match some the required skills, we can provide you with some preliminary exercise to help you fill in the gap.<br />
<br />
===Status: Available ===<br />
<!--Paul Scheffler, Luca Colagrande--><br />
: Supervision: [[:User:Smach | Stefan Mach]], [[:User:Pauling | Gianna Paulin]]<br />
<!--: Date: Spring 2019--><br />
[[Category:Digital]] [[Category:PULP]] [[Category:ASIC]] [[Category:Available]] [[Category:Semester Thesis]] [[Category:Master Thesis]] [[Category:2019]] [[Category:Acceleration and Transprecision]]<br />
<br />
===Professor===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
[[#top|↑ top]]<br />
<br />
==Practical Details==<br />
<br />
===Meetings & Presentations===<br />
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.<br />
<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [http://eda.ee.ethz.ch/index.php/Design_review].<br />
<br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.<br />
<br />
* '''[[Project Plan]]'''<br />
* '''[[Project Meetings]]'''<br />
* '''[[Design Review]]'''<br />
* '''[[Coding Guidelines]]'''<br />
* '''[[Final Report]]'''<br />
* '''[[Final Presentation]]'''<br />
<br />
<!--<br />
==Literature==<br />
* [Andri2017] R. Andri et al., YodaNN: an Architecture for Ultra-Low Power Binary-Weight CNN Acceleration, [https://arxiv.org/pdf/1606.05487.pdf]<br />
* [Conti2017] F. Conti et al., An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics, [https://arxiv.org/pdf/1612.05974.pdf]<br />
* [Intel2017] A. Zhou et al., Incremental Network Quantization: Towards Lossless CNNs with low-precision weights, [https://arxiv.org/pdf/1702.03044.pdf]<br />
* [Krizhevsky2012] A. Khrizevsky et al., ImageNet Classification with Deep Convolutional Neural Networks, [http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf]<br />
--><br />
<br />
==Links==<br />
* The EDA wiki with lots of information on the ETHZ ASIC design flow (internal only) [http://eda.ee.ethz.ch/]<br />
* The IIS/DZ coding guidelines [http://eda.ee.ethz.ch/index.php/Naming_Conventions]<br />
<br />
<br />
[[#top|↑ top]]<br />
<br />
<!--<br />
<br />
COPY PASTE FROM THE LIST BELOW TO ADD TO CATEGORIES<br />
<br />
GROUP<br />
[[Category:Digital]]<br />
[[Category:Analog]]<br />
[[Category:Nano-TCAD]]<br />
[[Category:Nano Electronics]]<br />
<br />
STATUS<br />
[[Category:Available]]<br />
[[Category:In progress]]<br />
[[Category:Completed]]<br />
[[Category:Hot]]<br />
<br />
TYPE OF WORK<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:PhD Thesis]]<br />
[[Category:Research]]<br />
<br />
NAMES OF EU/CTI/NT PROJECTS<br />
[[Category:UltrasoundToGo]]<br />
[[Category:IcySoC]]<br />
[[Category:PSocrates]]<br />
[[Category:UlpSoC]]<br />
[[Category:Qcrypt]]<br />
<br />
YEAR (IF FINISHED)<br />
[[Category:2010]]<br />
[[Category:2011]]<br />
[[Category:2012]]<br />
[[Category:2013]]<br />
[[Category:2014]]<br />
<br />
---></div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=PULP-Shield_for_Autonomous_UAV&diff=3546PULP-Shield for Autonomous UAV2018-02-23T09:06:36Z<p>Smach: </p>
<hr />
<div>[[Category:Energy Efficient Autonomous UAVs]] [[Category:Software]] [[Category:Digital]] [[Category:PULP]] [[Category:Completed]] [[Category:Semester Thesis]] [[Category:Dpalossi]] [[Category:Smach]]<br />
<br />
[[File:PULP-Shield.png|600px|thumb]]<br />
<br />
==Description==<br />
Unmanned Aerial Vehicles (UAVs) are increasingly being used for practical applications such as the inspection of industrial facilities or cultivated fields, assistance in natural disaster or hazardous areas, etc.<br />
<br />
Like it happened in many other fields, also in robotics the miniaturization of vehicles is one of the major trends of evolution. In this context commercial quadrotors have already reached the nano-scale, featuring only few centimeters in diameter and few tens of grams in weight.<br />
<br />
In this context the Parallel Ultra Low-Power Platform [1] developed here at IIS, is the key computational unit to bring state-of-the-art complex vision algorithms for autonomous navigation into the nano-scale class of vehicles.<br />
<br />
The goal of this thesis is to design, develop and test the first PULP-shield, a compact PCB pluggable into our target nano-size quadrotor [2] (open-source and open-hardware). The PCB design will include the development of 2 SPI interfaces, one directly connected to the Himax ULP camera [7] and a second connecting the accelerator to the existing MCU.<br />
<br />
Thus, the PULP-shield will be coupled with the STM32F405 MCU [5] on-board of the UAV, extending the computational capability of the drone, paving the way for the next-generation autonomous UAVs.<br />
<br />
In detail the goals of the project can be summarized as follows:<br />
* 1. HW design: starting from the drone's schematics, design and develop the PULP-shield with the two required SPI interfaces<br />
* 2. SW middleware: integration of the existing drivers in order to have: the full support of the interfaces and the offload mechanism to run the kernel on PULP<br />
* 3. SW application: test and extend the Visual Odometry pipeline proposed in [8]<br />
<br />
<br />
<br />
===Status: Completed ===<br />
: Semester Thesis by Hanna Mueller<br />
: Supervision: [[:User:Dpalossi | Daniele Palossi]], [[:User:Smach | Stefan Mach]], [[:User:Gomeza | Andres Gomez]]<br />
<br />
===Prerequisites===<br />
* Familiarity with embedded system programming in C.<br />
* Knowledge of PCB design would be an asset (i.e. Altium tool [3]).<br />
* Basic knowledge of Free RTOS [4] and STM32F4 MCU family [5] is favorable.<br />
<br />
===Character===<br />
: 25% Theory<br />
: 25% PCB design<br />
: 30% C embedded programming<br />
: 20% Verification and experimental evaluation<br />
===Professor===<br />
: [http://www.iis.ee.ethz.ch/portrait/staff/lbenini.en.html Luca Benini]<br />
<br />
==Detailed Task Description==<br />
<br />
===Meetings & Presentations===<br />
The student(s) and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues. <!--<br />
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [http://eda.ee.ethz.ch/index.php/Design_review]. <br />
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium (as required for any semester or master thesis at D-ITET).<br />
<br />
===Deliverables===<br />
* description of the most promising architectures, and argumentation on the decision taken (as part of the report)<br />
* synthesizable, verified VHDL code<br />
* generated test vector files<br />
* synthesis scripts & relevant software models developed for verification<br />
* synthesis results and final chip layout (GDS II data), bonding diagram<br />
* datasheet (part of report)<br />
* presentation slides<br />
* project report (in digital form; a hard copy also welcome, but not necessary)<br />
===Timeline==<br />
To give some idea on how the time can be split up, we provide some possible partitioning: <br />
* Literature survey, building a basic understanding of the problem at hand, catch up on related work <br />
* Development of a working software-based implementation running on the Zynq's ARM core<br />
* Piece-by-piece off-loading of relevant tasks to the programmable logic<br />
* Implementation of data interfaces (software or hardware)<br />
* Report and presentation <br />
--><br />
<!-- 13.5 weeks total here --><br />
<br />
===Literature===<br />
[1] PULP Project http://iis-projects.ee.ethz.ch/index.php/PULP<br />
<br />
[2] Crazyflie2.0 https://www.bitcraze.io/crazyflie-2/<br />
<br />
[3] Altium Design System http://www.altium.com/<br />
<br />
[4] Free RTOS http://www.freertos.org/<br />
<br />
[5] STM32F405/7 http://www.st.com/resource/en/datasheet/stm32f405og.pdf<br />
<br />
[6] F. Conti, D. Palossi, A. Marongiu, D. Rossi and L. Benini, "Enabling the heterogeneous accelerator model on ultra-low power microcontroller platforms," 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, 2016, pp. 1201-1206.<br />
<br />
[7] Himax ULP Image Sensor http://www.himax.com.tw/products/cmos-image-sensor/image-sensors/hm01b0/<br />
<br />
[8] D. Palossi, A. Marongiu, and L. Benini, "Ultra Low-Power Visual Odometry for Nano-Scale Unmanned Aerial Vehicle", 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, 2017 - to be published <br />
<br />
===Practical Details===<br />
* '''[[Project Plan]]'''<br />
* '''[[Project Meetings]]'''<br />
* '''[[Final Report]]'''<br />
* '''[[Final Presentation]]'''<br />
<br />
[[#top|↑ top]]<br />
<br />
<!-- <br />
<br />
COPY PASTE FROM THE LIST BELOW TO ADD TO CATEGORIES<br />
<br />
GROUP<br />
[[Category:Digital]]<br />
[[Category:Analog]]<br />
[[Category:Nano-TCAD]]<br />
[[Category:Nano Electronics]]<br />
<br />
STATUS<br />
[[Category:Available]]<br />
[[Category:In progress]]<br />
[[Category:Completed]]<br />
[[Category:Hot]]<br />
<br />
TYPE OF WORK<br />
[[Category:Semester Thesis]]<br />
[[Category:Master Thesis]]<br />
[[Category:PhD Thesis]]<br />
[[Category:Research]]<br />
<br />
NAMES OF EU/CTI/NT PROJECTS<br />
[[Category:UltrasoundToGo]]<br />
[[Category:IcySoC]]<br />
[[Category:PSocrates]]<br />
[[Category:UlpSoC]]<br />
[[Category:Qcrypt]]<br />
<br />
YEAR (IF FINISHED)<br />
[[Category:2010]]<br />
[[Category:2011]]<br />
[[Category:2012]]<br />
[[Category:2013]]<br />
[[Category:2014]]<br />
<br />
---></div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Stefan_Mach&diff=3543Stefan Mach2018-02-23T09:05:51Z<p>Smach: </p>
<hr />
<div>__NOTOC__<br />
Stefan Mach received his M.Sc. degree from the Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland, where he is currently pursuing a Ph.D. degree. Since 2017, he has been a research assistant with the Integrated Systems Laboratory at ETHZ. <br />
<br />
==Interests==<br />
* Computer Architecure and Microprocessors<br />
* Digital ASIC Design<br />
* Transprecision/Approximate Computing<br />
* Hardware Support for Energy-Efficient Transprecision Computing<br />
* Embedded Systems<br />
* PCB/Systems Design<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Projects in Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In progress<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Past Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Contact Information==<br />
* '''Office''': ETZ J 89<br />
* '''e-mail''': [mailto:smach@iis.ee.ethz.ch smach@iis.ee.ethz.ch]<br />
* '''phone''': (+41 44 63) 254 33<br />
<br />
[[Category:Supervisors]]<br />
[[Category:Digital]]</div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Stefan_Mach&diff=3460Stefan Mach2018-02-20T14:09:56Z<p>Smach: </p>
<hr />
<div>__NOTOC__<br />
Stefan Mach received his M.Sc. degree from the Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland, where he is currently pursuing a Ph.D. degree. Since 2017, he has been a research assistant with the Integrated Systems Laboratory at ETHZ. <br />
<br />
==Interests==<br />
* Computer Architecure and Microprocessors<br />
* Digital ASIC Design<br />
* Transprecision/Approximate Computing<br />
* Hardware Support for Energy-Efficient Transprecision Computing<br />
* Embedded Systems<br />
* PCB/Systems Design<br />
<br />
<!--<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Projects in Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In progress<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Past Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Smach<br />
</DynamicPageList><br />
--><br />
==Contact Information==<br />
* '''Office''': ETZ J 89<br />
* '''e-mail''': [mailto:smach@iis.ee.ethz.ch smach@iis.ee.ethz.ch]<br />
* '''phone''': (+41 44 63) 254 33<br />
<br />
[[Category:Supervisors]]<br />
[[Category:Digital]]</div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=Stefan_Mach&diff=3458Stefan Mach2018-02-20T14:05:43Z<p>Smach: Created page with "__NOTOC__ Stefan Mach received his M.Sc. degree from the Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland, where he is currently pursuing a Ph.D. degree. Since..."</p>
<hr />
<div>__NOTOC__<br />
Stefan Mach received his M.Sc. degree from the Swiss Federal Institute of Technology Zurich (ETHZ), Switzerland, where he is currently pursuing a Ph.D. degree. Since 2017, he has been a research assistant with the Integrated Systems Laboratory at ETHZ. <br />
<br />
==Interests==<br />
* Computer Architecure and Microprocessors<br />
* Digital ASIC Design<br />
* Transprecision/Approximate Computing<br />
* Hardware Support for Energy-Efficient Transprecision Computing<br />
* Embedded Systems<br />
* PCB/Systems Design<br />
<br />
==Available Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Available<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Projects in Progress==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = In progress<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Past Projects==<br />
<DynamicPageList><br />
supresserrors = true<br />
category = Completed<br />
category = Smach<br />
</DynamicPageList><br />
<br />
==Contact Information==<br />
* '''Office''': ETZ J 89<br />
* '''e-mail''': [mailto:smach@iis.ee.ethz.ch smach@iis.ee.ethz.ch]<br />
* '''phone''': (+41 44 63) 254 33<br />
<br />
[[Category:Supervisors]]<br />
[[Category:Digital]]</div>Smachhttp://iis-projects.ee.ethz.ch/index.php?title=User:Smach&diff=3450User:Smach2018-02-20T13:38:16Z<p>Smach: Redirected page to Stefan Mach</p>
<hr />
<div>#REDIRECT [[Stefan_Mach]]</div>Smach