Difference between revisions of "High-speed Scene Labeling on FPGA"

Latest revision as of 18:18, 29 August 2016

Short Description

Imaging sensor networks, UAVs, smartphones, driver assistance appliances, and other embedded computer vision systems require power-efficient, low-cost and high-speed implementations of synthetic vision systems capable of recognizing and classifying objects in a scene. Many popular algorithms in this area require the evaluations of multiple layers of filter banks. Almost all state-of-the-art synthetic vision systems are based on features extracted using multi-layer convolutional networks (ConvNets).

To be power efficient and achieve a high throughput at the same time, we would like to create a FPGA implementation of an entire scene labeling network. In order to keep the developed system flexible in terms of the convolutional neural network that is applied as well as the types of layer in the ConvNet, interaction between a flow controlling processor (e.g. an ARM core on a Xilinx Zynq) and the programmable logic is foreseen. If time permits or based on the preference of the student, some focus can also be given towards interfacing directly to a camera and a display or an ethernet adapter. As opposed to an ASIC project, such FPGA and hardware-software codesign work is much more applicable in industry and less constrained in terms of memory and interfaces. If desired by the student, also the use of high-level synthesis tools can be considered.

Status: Completed

Kevin Luchsinger

Supervision: Lukas Cavigelli, Francesco Conti

Date: FS 2016

Prerequisites

Interest in VLSI and sustem design, and computer vision
VLSI 1

Character

20% Theory / Literature Research

80% VLSI Architecture, Implementation & Verification

Professor

Luca Benini

↑ top

Detailed Task Description

Goals

The goals of this project are

for the student(s) to get to know the FPGA design flow from specification through architecture exploration to implementation, including the useof memory interfaces and other off-chip communication
to learn how to gradually port software blocks to programmable logic and design an entire hetergeneous system using with software, FPGA fabric and hardwired interfaces.

Meetings & Presentations

The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues. At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.\

Timeline

To give some idea on how the time can be split up, we provide some possible partitioning:

Literature survey, building a basic understanding of the problem at hand, catch up on related work
Development of a working software-based implementation running on the Zynq's ARM core
Piece-by-piece off-loading of relevant tasks to the programmable logic
Implementation of data interfaces (software or hardware)
Report and presentation

Literature

Hardware Acceleration of Convolutional Networks:
- C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello and Y. LeCun, "NeuFlow: A Runtime Reconfigurable Dataflow Processor for Vision", Proc. IEEE ECV'11@CVPR'11 [1]
- V. Gokhale, J. Jin, A. Dundar, B. Martini and E. Culurciello, "A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks", Proc. IEEE CVPRW'14 [2]
- [3]
two not-yet-published papers by our group on acceleration of ConvNets

Practical Details

Links

The EDA wiki with lots of information on the ETHZ ASIC design flow (internal only) [4]
The IIS/DZ coding guidelines [5]

↑ top

@@ Line 1: / Line 1: @@
-[[File:Labeled-scene.png|450px|thumb]]
+[[File:x1-adas.jpg|500px|thumb]]
+[[File:Labeled-scene.png|400px|thumb]]
 ==Short Description==
-Imaging sensor networks, UAVs, smartphones, and other embedded computer vision systems require power-efficient, low-cost and high-speed implementations of synthetic vision systems capable of recognizing and classifying objects in a scene. Many popular algorithms in this area require the evaluations of multiple layers of filter banks. Almost all state-of-the-art synthetic vision systems are based on features extracted using multi-layer convolutional networks (ConvNets).
+Imaging sensor networks, UAVs, smartphones, driver assistance appliances, and other embedded computer vision systems require power-efficient, low-cost and high-speed implementations of synthetic vision systems capable of recognizing and classifying objects in a scene. Many popular algorithms in this area require the evaluations of multiple layers of filter banks. Almost all state-of-the-art synthetic vision systems are based on features extracted using multi-layer convolutional networks (ConvNets).
-To be power efficient and achieve a high throughput at the same time, we would like to create a FPGA implementation of an entire scene labeling network. In order to keep the developed system flexible in terms of the convolutional neural network that is applied as well as the types of layer in the ConvNet, interaction between a flow controlling processor (e.g. an ARM core on a Xilinx Zynq) and the programmable logic is foreseen. If time permits or based on the preference of the student, some focus can also be given towards interfacing directly to a camera and a display or an ethernet adapter.
+To be power efficient and achieve a high throughput at the same time, we would like to create a FPGA implementation of an entire scene labeling network. In order to keep the developed system flexible in terms of the convolutional neural network that is applied as well as the types of layer in the ConvNet, interaction between a flow controlling processor (e.g. an ARM core on a Xilinx Zynq) and the programmable logic is foreseen. If time permits or based on the preference of the student, some focus can also be given towards interfacing directly to a camera and a display or an ethernet adapter. As opposed to an ASIC project, such FPGA and hardware-software codesign work is much more applicable in industry and less constrained in terms of memory and interfaces. If desired by the student, also the use of high-level synthesis tools can be considered.
-When evaluating ConvNets, most of the time is spent performing the convolutions (80% to 90%).
+===Status: Completed===
-Existing accelerators do most of the work in spatial domain. The focus of this work is on speeding up this step by creating an ASIC or FPGA accelerator to perform this step faster and more power-efficiently in the frequency domain.
+: Kevin Luchsinger
+: Supervision: [[:User:Lukasc | Lukas Cavigelli]], [[:User:Fconti | Francesco Conti]]
-===Status: Available ===
+: Date: FS 2016
-<!-- : David Gschwend, Christoph Mayer, Samuel Willi --->
+[[Category:Digital]] [[Category:FPGA]] [[Category:Completed]] [[Category:2016]] [[Category:Semester Thesis]]
-: Supervision: [[:User:Lukasc | Lukas Cavigelli]]
-: Date: tbd
-[[Category:Digital]] [[Category:Available]] [[Category:Semester Thesis]]
 ===Prerequisites===
-* Knowledge of Matlab
+* Interest in VLSI and sustem design, and computer vision
-* Interest in computer vision, signal processing and VLSI design
 * VLSI 1
-* If you want the ASIC to be manufactured, enrolment in VLSI 2 is required and at least one student has to test the chip as part of the VLSI 3 lecture.
 ===Character===
 : 20% Theory / Literature Research
-: 60% VLSI Architecture, Implementation & Verification
+: 80% VLSI Architecture, Implementation & Verification
-: 20% VLSI back-end Design (if you want to do an ASIC)
 ===Professor===
@@ Line 34: / Line 29: @@
 ===Goals===
 The goals of this project are
-* for the students to get to know the ASIC design flow from specification through architecture exploration to implementation, functional verification, back-end design and silicon testing.
+* for the student(s) to get to know the FPGA design flow from specification through architecture exploration to implementation, including the useof memory interfaces and other off-chip communication
-* to explore various architectures to perform the 2D convolutions used in convolutional networks energy-efficienctly in frequency domain, considering the constraints of an ASIC design, and performing fixed-point analyses for the most viable architecture(s)
+* to learn how to gradually port software blocks to programmable logic and design an entire hetergeneous system using with software, FPGA fabric and hardwired interfaces.
+<!--
 ===Important Steps===
 # Do some first project planning. Create a time schedule and set some milestones based on what you have learned as part of the VLSI lectures.
@@ Line 51: / Line 46: @@
 Be aware, that these steps cannot always be performed one after the other and often need some initial guesses followed by several iterations. Please use the supplied svn repository for your VHDL files and maybe even your notes, presentation, and the final report (you can check out the code on any device, collaborate more easily and intensively, keep track of changes to the code, have a backup of every version, ...).
+--->
 ===Meetings & Presentations===
 The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.
+<!--
 Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [http://eda.ee.ethz.ch/index.php/Design_review].
+--->
-At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.
+At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.\
+<!--
 ===Deliverables===
 * description of the most promising architectures, and argumentation on the decision taken (as part of the report)
@@ Line 68: / Line 64: @@
 * presentation slides
 * project report (in digital form; a hard copy also welcome, but not necessary)
+--->
 ===Timeline===
 To give some idea on how the time can be split up, we provide some possible partitioning:
-* Literature survey, building a basic understanding of the problem at hand, catch up on related work (2 week)
+* Literature survey, building a basic understanding of the problem at hand, catch up on related work
-* Architecture design & evaluation (2-3 weeks)
+* Development of a working software-based implementation running on the Zynq's ARM core
-* Fixed-point model, implementation loss, test environment (1-2 weeks)
+* Piece-by-piece off-loading of relevant tasks to the programmable logic
-* HDL implementation, simulation, debugging (3 weeks)
+* Implementation of data interfaces (software or hardware)
-* Synthesis/Backend (2 weeks)
+* Report and presentation
-* Report and presentation (2-3 weeks)
 <!-- 13.5 weeks total here -->
 ===Literature===
-* FFT-related: [http://ieee-hpec.org/2013/index_htm_files/45-PID2888029.pdf], [http://spiral.ece.cmu.edu:8080/pub-spiral/pubfile/12icassp_163.pdf]
 * Hardware Acceleration of Convolutional Networks:
 ** C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello and Y. LeCun, "NeuFlow: A Runtime Reconfigurable Dataflow Processor for Vision", Proc. IEEE ECV'11@CVPR'11 [http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=5981829]

Personal tools

Difference between revisions of "High-speed Scene Labeling on FPGA" - iis-projects

Search

Navigation

Tools