Deep Learning Projects
From iisprojects
Contents
What is Deep Learning?
Nowadays, machine learning systems are the goto choice when the cost of analytically deriving closedform expressions to solve a given problem is prohibitive (e.g., it is very timeconsuming, or the knowledge about the problem is insufficient). Machine learning systems can be particularly effective when the amount of data is large, since the statistics are expected to get more and more stable as the amount of data increases. Amongst machine learning systems, deep neural networks (DNNs) have established a reputation for their effectiveness and simplicity. To understand this success as compared to that of other machine learning systems, it is important to consider not only the accuracy performance of DNNs, but also their computational properties. The training algorithm (an iterative application of backpropagation and stochastic gradient descent) is linear in the data set size, making it more appealing in big data contexts than, for instance, support vector machines (SVMs). DNNs do not use branching instructions, making them predictable programs and allowing to design efficient access patterns for the memory hierarchies of the computing devices (exploiting spatial and temporal locality). DNNs are parallelizable, both at the neuron level and at the layer level. These predictability and parallelizability properties make DNNs an ideal fit for modern SIMD architectures and distributed computing systems.
The main drawback of these systems is their size: millions or even billions of parameters are a common feature of many topperforming DNNs, and a proportional amount of arithmetic operations must be performed to process each data sample. Hence, to reduce the pressure of DNNs on the underlying computing infrastructure, research in computational deep learning has focussed on two families of optimizations: topological and hardwareoriented.
Topological optimizations are concerned with network topologies (AKA network architectures) which are more efficient in terms of accuracyperparameter or accuracyperMAC (multiplyaccumulate operation). As a specific form of topological optimization, pruning strategies aim at maximizing the number of zerovalued operands (parameters and/or activations) in order to 1) take advantage of sparsity (for storing the model) and to 2) minimize the number of effective arithmetic operations (i.e., the operations not involving zerovalued operands, which must be actually executed). Hardwareoriented optimizations are instead concerned with replacing timeconsuming and energyhungry operations, such as evaluations of transcendent functions or floatingpoint MAC operations, with more efficient counterparts, such as piecewise linear activation functions (e.g., the ReLU) and integer MAC operations (as in quantized neural networks, QNNs).
Hardwareoriented neural architecture search (NAS)
The problems of topology selection and pruning can be considered instances of the classical statistics problems of model selection and feature selection, respectively. In the scope of deep learning, model selection is also called neural architecture search (NAS). When designing a DNN topology, you have a large number of degrees of freedom at your disposal: number of layers, number of neurons for each layer, connectivity of each neuron, and so on; moreover, the number of choices for each degree of freedom is huge. These properties imply that the design space for a DNN can grow exponentially, making exhaustive searches prohibitive. Therefore, to increase the efficiency of the exploration, stochastic optimization tools are the preferred choice: evolutionary algorithms, reinforcement learning, gradientbased techniques or even random graph generation. An interesting feature of model selection is that specific constraints can be enforced on the search space so that desired properties are always respected. For instance, given a storage budget describing a hard limitation of the chosen computing platform, the network generation algorithm can be limited to propose topologies that do not exceed a given number of parameters. This capability of incorporating HW features as constraints on the search space make NAS algorithms very interesting in the context of generating HWfriendly DNNs.
Thorir Mar Ingolfsson

Matteo Spallanzani

Training algorithms for quantized neural networks (QNNs)
The typical training algorithm for DNNs is an iterative application of the backpropagation algorithm (BP) and stochastic gradient descent (SGD). When the quantization is not “aggressive” (i.e., when the parameters and feature maps can be represented as integers with a precision of 8bits or more), many solutions are available either in specialized literature or in commercial software that can convert models pretrained with gradient descent to quantized counterparts (posttraining quantization). But when the precision is extremely reduced (i.e., 1bit or 2bits operands), these solutions can no longer be applied, and quantizationaware training algorithms are needed. The naive application of gradient descent (which in theory is not even correct) to train these QNNs yields major accuracy drops. Hence, it is likely that suitable training algorithms for QNNs require to replace the standard BP+SGD scheme, which is suitable for differentiable optimization, with search strategies that are more apt for discrete optimization.
Matteo Spallanzani

Hardware Acceleration of DNNs and QNNs
Deep Learning (DL) and Artificial Intelligence (AI) are quickly becoming dominant paradigms for all kinds of analytics, complementing or replacing traditional data science methods. Successful atscale deployment of these algorithms requires deploying them directly at the data source, i.e. in the IoT endnodes collecting data. However, due to the extreme constraints of these devices (in terms of power, memory footprint, area cost), performing full DL inference insitu in lowpower endnodes requires a breakthrough in computational performance and efficiency. It is widely known that the numerical representation typically used when developing DL algorithms (singleprecision floatingpoint) encodes a higher precision than what is actually required to achieve high qualityofresults in inference (Courbariaux et al. 2016); this fact can be exploited in the design of energyefficient hardware for DL. For example, by using ternary weights, which means all network weights are quantized to {1,0,1}, we can design the fundamental compute units in hardware without using an HWexpensive multiplication unit. Additionally, it allows us to store the weights much more compact onchip.
Gianna Paulin

Georg Rutishauser

Moritz Scherer

Tim Fischer

Arpan Suravi Prasad

EventDriven Computing
With the increasing demand for "smart" algorithms on mobile and wearable devices, the energy cost of computing is becoming the bottleneck for battery lifetime. One approach to defuse this bottleneck is to reduce the compute activity on such devices  one of the most popular approaches uses sensor information to determine whether it is worth to run expensive computations or whether there is not enough activity in the environment. This approach is called eventdriven computing. Eventdriven architectures can be implemented for many applications  From pure sensing platforms to multicore systems for machine learning on the edge. At IIS, we cover most of these applications. Besides working with novel, stateoftheart sensors and sensing platforms to push the limits of lifetime of wearables and mobile devices, we also work with cuttingedge computing systems like Intel Loihi for Spiking Neural Networks to minimize the energy cost of machine intelligence.
Alfio Di Mauro

Moritz Scherer

Prerequisites
We have no strict, general requirements, as they are highly dependent on the exact project steps. The projects will be adapted to the skills and interests of the student(s)  just come talk to us! If you don't know about GPU programming or CNNs or ... just let us know and we can together determine what is a useful way to go  after all you are here to learn not only about project work but also to develop your technical skills.
Only hard requirements:
 Excitement for deep learning
 For HW Design projects: VLSI 1, VLSI 2 or equivalent
Tags
All our projects will be categorized into three categories. Therefore, look out for the following tags:
 Algorithmic  you will mainly make algorithmic evaluations using languages and frameworks like e.g. Python, Pytorch, Tensorflow and our inhouse frameworks like Quantlab, DORY, NEMO
 Embedded Coding  you will implement e.g. ccode for one of our microcontrollers
 HW Design  you will be designing HW including writing RTL, simulate, synthesize, and layout (backend) some HW
Available Projects
New projects are constantly being added, check back often! If you have any questions or would like to propose own ideas, do not hesitate to contact us!
 Bridging QuantLab with LPDNN
 Efficient TNN Inference on PULP Systems
 Evaluating SoA PostTraining Quantization Algorithms
 Exploration and Hardware Acceleration of IntraLayer MixedPrecision QNNs
 Exploring NAS spaces with CBRED
 Feature Extraction for Speech Recognition (1S)
 Flexfloat DL Training Framework
 Knowledge Distillation for Embedded Machine Learning
 Mapping Networks on Reconfigurable Binary Engine Accelerator
 MixedPrecision Neural Networks for BrainComputer Interface Applications
 Neural Architecture Search using Reinforcement Learning and Search Space Reduction
 Probabilistic training algorithms for quantized neural networks
 Probing the limits of fakequantised neural networks
 ResourceConstrained FewShot Learning for Keyword Spotting (1S)
 Ternary Neural Networks for Face Recognition
 Training and Deploying NextGeneration Quantized Neural Networks on Microcontrollers
 Visualization of Neural Architecture Search Spaces
 Evaluating An Ultra low Power Vision Node
 Spiking Neural Network for Autonomous Navigation
 EventDriven Convolutional Neural Network Modular Accelerator
 Level Crossing ADC For a Many Channels Neural Recording Interface
Projects in Progress