Personal tools

Difference between revisions of "FFT-based Convolutional Network Accelerator"

From iis-projects

Jump to: navigation, search
(Created page with "<!--500px|thumb---> ==Short Description== Imaging sensor networks, UAVs, smartphones, and other embedded computer vision systems require power-effici...")
 
Line 4: Line 4:
 
Imaging sensor networks, UAVs, smartphones, and other embedded computer vision systems require power-efficient, low-cost and high-speed implementations of synthetic vision systems capable of recognizing and classifying objects in a scene. Many popular algorithms in this area require the evaluations of multiple layers of filter banks. Almost all state-of-the-art synthetic vision systems are based on features extracted using multi-layer convolutional networks (ConvNets). When evaluating ConvNets, most of the time is spent performing the convolutions (80% to 90%). Existing accelerators do most of the work in spatial domain. The focus of this work is on speeding up this step by creating an accelerator to perform this step faster and more power-efficiently in the frequency domain.  
 
Imaging sensor networks, UAVs, smartphones, and other embedded computer vision systems require power-efficient, low-cost and high-speed implementations of synthetic vision systems capable of recognizing and classifying objects in a scene. Many popular algorithms in this area require the evaluations of multiple layers of filter banks. Almost all state-of-the-art synthetic vision systems are based on features extracted using multi-layer convolutional networks (ConvNets). When evaluating ConvNets, most of the time is spent performing the convolutions (80% to 90%). Existing accelerators do most of the work in spatial domain. The focus of this work is on speeding up this step by creating an accelerator to perform this step faster and more power-efficiently in the frequency domain.  
  
<!--
+
===Status: Available ===
More and more video surveillance data is being collect for real-time surveillance and storage. Privacy is a real issue, posing a legal obstacle when public places are being monitored: real-time surveillance is not allowed in such cases, and stored data can (even for internal use) only be accessed with a court order.
+
<!-- : David Gschwend, Christoph Mayer, Samuel Willi --->
 
+
: Supervision: [[:User:Lukasc | Lukas Cavigelli]]
With the use of privacy enhancement techniques, this is different. Currently, such systems are often based on simple motion detection, blurring everything that has moved recently. This can be an option if there is little activity and only very low detail is needed. However, when monitoring a crowded area the results are useless and important detail such as the person's movements is completely hidden.
+
: Date: tbd
 
+
[[Category:Digital]] [[Category:Available]] [[Category:Semester Thesis]]
This project is supposed to overcome this by using deep learning techniques to detect pedestrians/persons and using temporal/motion information to improve the delineation of moving objects. This way the pedestrians can be overpainted, blurred, or overlaid with motion-based information, protecting their privacy while enabling better information to security personnel.
 
--->
 
===Status: Completed ===
 
: David Gschwend, Christoph Mayer, Samuel Willi
 
: Supervision: [[:User:Lukasc | Lukas Cavigelli]], [[:User:muheim| Beat Muheim]]
 
: Date: Fall Semester 2014 (sem14h17, sem14h18, sem14h19)
 
[[Category:Digital]] [[Category:Completed]] [[Category:Semester Thesis]] [[Category:2014]]
 
  
 
===Prerequisites===
 
===Prerequisites===
 
: Knowledge of Matlab
 
: Knowledge of Matlab
: Interest in video processing and VLSI design
+
: Interest in computer vision, signal processing and VLSI design
: VLSI 1 and enrolment in VLSI 2 is required
+
: VLSI 1
: At least one student has to test the chip as part of the VLSI 3 lecture, if the ASIC should be manufactured.  
+
: If you want the ASIC to be manufactured, enrolment in VLSI 2 is required and at least one student has to test the chip as part of the VLSI 3 lecture.  
<!--
 
===Status: Completed ===
 
: Fall Semester 2014 (sem13h2)
 
: Matthias Baer, Renzo Andri
 
--->
 
<!--
 
===Status: In Progress ===
 
: Student A, StudentB
 
: Supervision: [[:User:Mluisier | Mathieu Luisier]]
 
--->
 
  
 
===Character===
 
===Character===
: 10% Theory / Literature Research  
+
: 20% Theory / Literature Research  
 
: 60% VLSI Architecture, Implementation & Verification
 
: 60% VLSI Architecture, Implementation & Verification
: 30% VLSI back-end Design
+
: 20% VLSI back-end Design
  
 
===Professor===
 
===Professor===
Line 105: Line 88:
 
* '''[[Final Report]]'''
 
* '''[[Final Report]]'''
 
* '''[[Final Presentation]]'''
 
* '''[[Final Presentation]]'''
 
==Results==
 
  
 
==Links==  
 
==Links==  

Revision as of 14:47, 25 March 2015


Short Description

Imaging sensor networks, UAVs, smartphones, and other embedded computer vision systems require power-efficient, low-cost and high-speed implementations of synthetic vision systems capable of recognizing and classifying objects in a scene. Many popular algorithms in this area require the evaluations of multiple layers of filter banks. Almost all state-of-the-art synthetic vision systems are based on features extracted using multi-layer convolutional networks (ConvNets). When evaluating ConvNets, most of the time is spent performing the convolutions (80% to 90%). Existing accelerators do most of the work in spatial domain. The focus of this work is on speeding up this step by creating an accelerator to perform this step faster and more power-efficiently in the frequency domain.

Status: Available

Supervision: Lukas Cavigelli
Date: tbd

Prerequisites

Knowledge of Matlab
Interest in computer vision, signal processing and VLSI design
VLSI 1
If you want the ASIC to be manufactured, enrolment in VLSI 2 is required and at least one student has to test the chip as part of the VLSI 3 lecture.

Character

20% Theory / Literature Research
60% VLSI Architecture, Implementation & Verification
20% VLSI back-end Design

Professor

Luca Benini

↑ top

Detailed Task Description

Goals

The goals of this project are to

  • Explore various architectures to perform the 2D convolutions used in convolutional networks, considering the constraints of an ASIC design, and performing fixed-point analyses for the most viable architecture(s)
  • Get to know the ASIC design flow from specification through architecture exploration to implementation, functional verification, back-end design and silicon testing.

Important Steps

  1. Do some first project planning. Create a time schedule and set some milestones based on what you have learned as part of the VLSI lectures.
  2. Get to understand the basic concepts of convolutional networks.
  3. Catch up on relevant previous work, in particular the papers we give to you.
  4. Become aware of the possibilities and limitations of the used technology; make some very rough estimates of area and timing. Also consider setting some target specifications for your chip.
  5. Come up with and evaluate/discuss several possible architectures (architecture exploration), implement the datapath/most resource relevant parts to get some first impression of the most promissing architecture(s). Also give some first thoughts to testability.
  6. Run detailed fixed-point analyses to determine the signal width in all parts of the data path.
  7. Create high quality, synthesizable VHDL code for your circuit. Please respect the lab's coding guidelines and continuously verify proper functionality of the individual parts of your design.
  8. Implement the necessary configuration interface, ...
  9. Perform thorough functional verification. This is very important.
  10. Take your final implementation through the backend design process.
  11. Write a project report. Include all major decisions taken during the design process and argue your choice. Include everything that deviates from the very standard case -- show off everything that took time to figure out and all your ideas that have influenced the project.

Be aware, that these steps cannot always be performed one after the other and often need some initial guesses followed by several iterations. Please use the supplied svn repository for your VHDL files and maybe even your notes, presentation, and the final report (you can check out the code on any device, collaborate more easily and intensively, keep track of changes to the code, have a backup of every version, ...).

Meetings & Presentations

The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.

Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to [1].

At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.

Deliverables

  • description of the most promising architectures, and argumentation on the decision taken (as part of the report)
  • synthesizable, verified VHDL code
  • generated test vector files
  • synthesis scripts & relevant software models developed for verification
  • synthesis results and final chip layout (GDS II data), bonding diagram
  • datasheet (part of report)
  • presentation slides
  • project report (in digital form; a hard copy also welcome, but not necessary)

Timeline

To give some idea on how the time can be split up, we provide some possible partitioning:

  • Literature survey, building a basic understanding of the problem at hand (1 week)
  • Architecture design & evaluation (2-3 weeks)
  • Fixed-point model, implementation loss, test environment (1-2 weeks)
  • HDL implementation, simulation, debugging (4 weeks)
  • Synthesis/Backend (2 weeks)
  • Report and presentation (2-3 weeks)

Literature

  • NeuFlow [2] in general and in particular
    • C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello and Y. LeCun, "NeuFlow: A Runtime Reconfigurable Dataflow Processor for Vision", Proc. IEEE ECV'11@CVPR'11 [3]
    • V. Gokhale, J. Jin, A. Dundar, B. Martini and E. Culurciello, "A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks", Proc. IEEE CVPRW'14 [4]
  • a not-yet-published paper of our group by F. Conti and L. Benini on a hardware-accelerator for ConvNets

Practical Details

Links

  • The EDA wiki with lots of information on the ETHZ ASIC design flow (internal only) [5]
  • The IIS/DZ coding guidelines [6]


↑ top