Personal tools

NVDLA meets PULP

From iis-projects

Revision as of 16:06, 7 December 2018 by Fschuiki (talk | contribs) (Created page with "==Introduction== After many years of neglection, “classic” Cray-like vector processors have been proposed again [Asanovic2016][Lee2015] as a more general and elegant solut...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Introduction

After many years of neglection, “classic” Cray-like vector processors have been proposed again [Asanovic2016][Lee2015] as a more general and elegant solution than packed single-instruction multiple-data extensions (e.g. Intel AVX) and GPU-like single-instruction multiple-thread approaches to parallelism. However, many have been sceptical that such a scheme can be applied to accelerate applications in low-power devices such as the PULP chips we develop at IIS. However, a vector processor shares many similarities with custom-designed HW accelerators that we have succesfully coupled in the past with our platform, with the additional advantage of more flexibility in the target application.

Project description

The primary purpose of this project is to design a cluster-coupled vector coprocessor to be deployed within a PULP cluster similar to Fulmine [Conti2017]. The main objectives of this project will be the following:

  • A simple base design. The design should be a base to build specialized accelerators, not a fully autonomous core. The base core should include mostly the instructions needed to fetch data, move it between registers, and perform basic arithmetic operations. A possibile approach is to take Zero-Riscy, one of the smallest RISC-V cores introduced at IIS [Schiavone2017] and use it as a baseline.
  • Designed for shared-memory interaction. The vector coprocessor must be thought from the ground up to communicate with the shared memory of the PULP clusters. Techniques and ideas developed in the context of PULP HW accelerators [Conti2017][Azarkhish2017] can be used to make this more efficient.
  • Design for minimum energy. This particular coprocessor will be designed not to maximize performance, but to minimize energy spent on highly data-parallel and vectorizable applications, such as machine learning applications (support vector machines, neural networks). To this end, one of the possibilities is to explore frequency scaling within the coprocessor.
  • Compliance to RISC-V guidelines. While the RISC-V Foundation has not ratified yet official specifications for their vector extensions, the design should satisfy at least coarsely the guidelines exposed in [Asanovic2016].

The design will be performed in SystemVerilog, for full compliance with the rest of the PULP design flow.

Required Skills

To work on this project, you will need:

  • to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) -- having followed the VLSI1 / VLSI2 courses is recommended
  • to have prior knowedge of hardware design and computer architecture -- having followed the "Advanced System-on-Chip Design" or "Energy-Efficient Parallel Computing Systems for Data Analytics" course is recommended
  • to have prior knowledge of basic machine learning, mainly DNNs/CNNs which will be used as sample workloads

Other skills that you might find useful include:

  • familiarity with git, the UNIX shell, C programming
  • to be strongly motivated for a difficult but super-cool project

Status: Available

Supervision: Fabian Schuiki

Professor

Luca Benini

Practical Details

Meetings & Presentations

The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.

At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.

Literature

  • [Asanovic2016] K. Asanovic, RISC-V Vector Extension proposal [1]
  • [Conti2017] F. Conti et al., An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics [2]
  • [Lee2015] Y. Lee et al., Hwachwa Instruction Set Architecture [3] and Microarchitecture [4]
  • [Schiavone2017] P. D. Schiavone et al., Slow and Steady Wins the Race? A Comparison of Ultra-Low-Power RISC-V Cores for Internet-of-Things Applications

Links

  • The EDA wiki with lots of information on the ETHZ ASIC design flow (internal only) [5]
  • The IIS/DZ coding guidelines [6]