Deep Learning Projects

We are listing a few projects below to give you an idea of what we do. However, we constantly have new project ideas and maybe some other approaches become obsolete in the very rapidly advancing research area. Please just contact the people of a project most similar to what you would like to do, and come talk to us.

Prerequisites

We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s) -- just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go -- after all you are here to learn not only about project work, but also to develop your technical skills.

Only hard requirements:

Excitement for deep learning
For VLSI projects: VLSI 1 or equivalent

Available Projects

Status	Type	Project Name	Description	Platform	Workload Type	First Contact(s)
available	MA/SA	RISC-V LSTM Accelerator	LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. An alternative a heteogeneous processor architecture, where a general purpose processor is extended with special-purpose accelarators. In a previous semester project, a first LSTM accelerator attached to PULP has been developed. In this thesis we would look into evaluation and optimization of this accelerator.	ASIC	HW (ASIC)	Gianna Paulin
available	MA	A system-level LSTM Acceleration	LSTM are the state of the art neural networks for time-series data (e.g. audio). Full-custom HW accelerators have been presented, but they usually lack in flexibility and a separate controller (e.g. a microcontroller) is needed to control it. In this project an accelerator for LSTM is implemented as a coarse-grain coprocessor to the RISC-V processor to address this issue. The work will explore datapath, internal storage needs, control interface, memory bandwidth requirements into the L1 in an environment with one or more RISC-V processors. This means that the complete system (e.g. memory bus) has to be analyzed and if necessary be adapted.	ASIC	HW	Gianna Paulin
available	MA/SA	TWN HW Accel.	INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we have thus created an accelerator and integrated it into a PULP(issimo) processor system. In this project, you will further improve this accelerator and/or its software. Depending on the number of students and project type, this could lead to a chip tape-out.	ASIC	HW (ASIC) & SW	Georg Rutishauser
available	MA/SA	Ternary-Weights TCN HW Accel.	INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. Temporal convolutional networks (TCN) have recently been proposed for sequence modelling tasks and achieve state-of-the-art-performance on translation task. TCNs are making use of 1D-fully-convolutional network and causal convolutions. In this work a HW accelerator should be implemented with the ultimate goal of energy efficiency. Potentially this work will make use of an existing ternary-weight convolution accelerator.	ASIC	HW (ASIC)	Georg Rutishauser, Gianna Paulin
available	MA/SA	Ternary-Weights TCN Training	INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. Temporal convolutional networks (TCN) have recently been proposed for sequence modelling tasks and achieve state-of-the-art-performance on translation task. In this project, you will explore how to train TCN for the use ternary weights with various state-of-the-art training schemes.	Workstation	SW (algorithm evals)	Georg Rutishauser, Gianna Paulin
available	SA	Parallel EBPC	A large part of the power consumption of neural network accelerators goes towards accessing feature maps stored in large central memories. Extended Bit-Plane Compression (EBPC) is a novel, hardware-friendly compression algorithm for DNN feature maps which makes it possible to reduce the transferred data volume and with it, power consumption. A baseline hardware implementation of EBPC which processes a single 8-bit stream of data has already been developed. The next step, and the goal of this project, is to transform it into a parallel architecture which can process multiple 8-bit words at a time while keeping the original architecture's energy efficiency intact (or improving it!).	ASIC/FPGA	HW	Georg Rutishauser

Workload types: SW (GPU), SW (microcontr.), SW (algorithm evals), HW (FPGA), HW (ASIC), HW (PCB)

On-Going Projects

Status	Type	Project Name	Description	Platform	Workload Type	Supervisors

Completed Projects

Status	Type	Project Name	Description	Platform	Workload Type	First Contact(s)
completed FS20	MA	TNN HW Accel.	Deep neural networks are notoriously hard to compute, need a lot of memory, and require a tremendous amount of energy, even just for inference. A lot of efforts try to reduce the precision of arithmetic operations to 16 bit, 12 bit, or 8 bit. However, with appropriate training methods and at the cost of some accuracy, the networks can be trained to work with binary or ternary intermediate results and filters. We have sketched a possible architecture which is fully targeted at minimizing the energy cost. This way, a TNN could be used for always-on sensing of e.g. audio data and then trigger more energy costly high-precision DNN inference with more classes on another device upon detecting an interesting signal.	ASIC	HW (ASIC) & SW	Georg Rutishauser, Lukas Cavigelli
completed HS20	MA/SA	Quantized Training of Recurrent Neural Networks	Recurrent neural networks (RNNs), especially Long Short-Term Memory (LSTM) RNNs, achieve state-of-the-art performance in time series analysis such as speech recognition. RNNs come with additional challenges such as an internal state that needs to be stored and regularly updated, a very large memory footprint and high bandwidth requirements. Research in the last few years has shown that most neural networks can be quantized with a small accuracy cost. The goal of the project is to train a quantized LSTM RNN.	GPU	SW (algorithm evals)	Gianna Paulin, Lukas Cavigelli, Francesco Conti
completed FS19	2x SA	TWN HW Accel.	INQ (incremental network quantization) is a quantization technique which has been proven to work very well for neural networks. The weights are quantized to levels of {+-2^n, 0}, and we are particularly interested in the case of {-1,0,1}. This way we get rid of all the multiplications and much more compactly store the weights on-chip, which is great for HW acceleration. In order to keep the flexibility and ease of use in an actual system, we would like to integrate this accelerator into a PULP(issimo) processor system. In this thesis, you will develop the accelerator and/or integrate it into the PULPissimo system.	ASIC	HW (ASIC) & SW	Lukas Cavigelli, Renzo Andri, Georg Rutishauser
completed FS19	1x SA	Stand-Alone Edge Computing with GAP8	Detailed description: Stand-Alone_Edge_Computing_with_GAP8	Embedded	SW/HW (PCB-level)	Renzo Andri, Lukas Cavigelli, Andres Gomez, Naomi Stricker (TIK)
completed FS19	1x SA	Data Bottlenecks in DNNs	In many systems, we have a combination of remote sensing nodes and centralized analysis. Such systems' operating cost and energy consumption is often dominated by communication, such that data compression becomes crucial. The strongest compression is usually archieved when performing the whole analysis on the sensor node and transmitting the result (e.g. a label instead of an image), but the sensor node might not have enough processing power available or the data of multiple sensor nodes has to be combined for a good classification/estimate/result. In this project, you will explore how to train DNNs for such problems with a data bottleneck within the DNN, where you will be using a not-yet-published quantization method. If taken as a MA, the result of the algorithmic exploration can be implemented on an embedded platform.	Workstation	SW (algorithm evals)	Lukas Cavigelli, Matteo Spallanzani
completed HS18	1x MA	DNN Training Accelerator	The compute effort to train state-of-the-art CNNs is tremendous and largely done on GPUs, or less frequently on specialized HW (e.g. Google's TPUs). Their energy effiency and often performance is limited by DRAM accesses. When storing all the data required for the gradient descent step of typical DNNs, there is no way to store it in on-chip SRAM--even across multiple, very large chips. Recently, Invertible ResNets has been presented (cf. paper) and allows to trade these storage requirements for some more compute effort--a huge opportunity. In this project, you will perform an architecture exploration to analyze how this could best be exploited.	ASIC	HW (ASIC)	Lukas Cavigelli
completed HS18	1x MA	One-shot/Few-shot Learning	One-shot learning comes in handy whenever it is not possible to collect a large dataset. Consider for example face identification as a form of opening you apartment's door, where the user provides a single picture (not 100s) and is recognized reliably from then on. In this project you would apply a method called Prototypical Networks (cf. [paper, code]) to learn to identify faces. Once you have trained such a DNN, you will optimize it for an embedded system to run it in real time. For a master thesis, an interesting additional step could be to look at expanding this further to share information between multiple nodes/cameras and learn to re-identify faces also as they evolve over time.	Embedded GPU or Microcontroller	SW (algo, uC)	Lukas Cavigelli, Renzo Andri
completed HS18	1x SA	SAR Data Analysis	We would like to explore the automated analysis of aerial synthetic aperture radar (SAR) images. Essentially, we have one very high-resolution image of a Swiss city and no labels. This project is not about labeling a lot of data, but to explore various options for supervised (cf. paper) or semi-/unsupervised learning to segment these images using very few labeled data.	Workstation	SW (algo evals)	Xiaying Wang, Lukas Cavigelli, Michele Magno
completed HS18	1x SA	Ternary-Weight FPGA System	Together with an external partner we are evaluating how combining binary or ternary-weight CNN can be deployed on FPGAs to push the throughput/cost ratio higher than embedded GPUs. In this project, you will implement a hardware accelerator for ternary weight network and integrate it into a fairly complete FPGA/Zynq-based system with camera etc. for real-time pose detection.	FPGA/Zynq	HW & SW (FPGA)	Lukas Cavigelli
completed FS18	1x SA	CBinfer for Speech Recognition	We have recently published an approach to dramatically reduce computation effort when performing object detection on video streams with limited frame-to-frame changes (cf. paper). We think this approach could also be applied to audio signals for continuous listening to void commands: when looking at MFCCs or the short-term Fourier transform, changes in the spectrum between neighboring time windows are also limited.	Embedded GPU (Tegra X2)	SW (GPU, algo evals)	Lukas Cavigelli

Where to find us

Gianna Paulin, ETZ J 76.2, pauling@iis.ee.ethz.ch
Georg Rutishauser, ETZ J 68.2, georgr@iis.ee.ethz.ch

Personal tools

Deep Learning Projects - iis-projects

Search

Navigation

Tools