Personal tools

Near-Memory Training of Neural Networks

From iis-projects

Jump to: navigation, search
AMD Fiji GPU package with GPU, HBM memory and interposer

Introduction

Machine Learning is the current hot topic of the Big Data hype. Large companies such as Google and Microsoft pour substantial resources into research of its various aspects. At the algorithmic front siginificant advances have been made in the previous years, with new shapes and techniques for networks and new forms of Stochastic Gradient Descent making an appearance. The regular structure of the calculation involved has favoured the development and research into hardware accelerators. However, focus has mainly been on the inference part, i.e. given an input finding the output of a network. Training of such networks has received little attention with respect to hardware acceleration. Based on the PULP [1] platform we have developed the streaming co-processor "NeuroStream" that helps to fill this gap. Together with a RISC-V processor core for control, multiple such NeuroStreams form a computation cluster ("NeuroCluster"). Since training of neural networks is very data-intensive, we envision multiple of these clusters to be integrated inside modern DRAM Hybrid Memory Cubes [2], or in close proximity to High Bandwidth Memory [3].

NeuroStream Architecture
NeuroCluster and NeuroSoC Architecture

Short Description

Your mission, should you choose to accept it, is to join our research into such near-memory computing either from the hardware or software side. The NeuroStream co-processor has been verified to perform the correct operations on a very small scale. However, to be a convincing solution to the problems it tries to tackle, and to further verify that it actually works and is up to the task, a larger scale implementation is needed. The IIS has access to a small number of interconnected FPGA computing nodes [4] through a joint research project with Microsoft. This thesis offers the opportunity to either adapt the existing NeuroStream/NeruoCluster HDL code to the FPGAs and develop a communication scheme among them, then squeeze as many NeuroClusters onto one FPGA as possible. Or you can dig into the software side, leveraging the compute capabilities of NeuroStream to implement the layers of common Deep Neural Networks (e.g. GoogLeNet [5]) or Recurrent Neural Networks [6,7]. The project is very flexible and we can tailor it to your personal preferences and skills!

Status: Available

Looking for interested master students (Semester or Master Project)
Supervision: Fabian Schuiki, Florian Zaruba

Character

20% Theory and Algorithms
50% Implementation (HDL or C/C++ coding)
30% Verification and Testing

Prerequisites

If the focus shall be on hardware:

VLSI I
VLSI II (recommended)
VHDL/SystemVerilog

If the focus shall be on software:

Knowledge in Machine Learning, or willingness to acquire such (DNN, LSTM/GRU)
C/C++

Professor

Luca Benini

References

[1] http://iis-projects.ee.ethz.ch/index.php/PULP
[2] http://www.hybridmemorycube.org/
[3] https://www.amd.com/Documents/High-Bandwidth-Memory-HBM.pdf
[4] https://www.microsoft.com/en-us/research/project/project-catapult/
[5] http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf
[6] http://colah.github.io/posts/2015-08-Understanding-LSTMs/
[7] http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/

↑ top