Low-Energy Cluster-Coupled Vector Coprocessor for Special-Purpose PULP Acceleration
After many years of neglection, “classic” Cray-like vector processors have been proposed again [Asanovic2016][Lee2015] as a more general and elegant solution than packed single-instruction multiple-data extensions (e.g. Intel AVX) and GPU-like single-instruction multiple-thread approaches to parallelism. However, many have been sceptical that such a scheme can be applied to accelerate applications in low-power devices such as the PULP chips we develop at IIS. However, a vector processor shares many similarities with custom-designed HW accelerators that we have succesfully coupled in the past with our platform, with the additional advantage of more flexibility in the target application.
The primary purpose of this project is to design a cluster-coupled vector coprocessor to be deployed within a PULP cluster similar to Fulmine [Conti2017]. The main objectives of this project will be the following:
- A simple base design. The design should be a base to build specialized accelerators, not a fully autonomous core. The base core should include mostly the instructions needed to fetch data, move it between registers, and perform basic arithmetic operations. A possibile approach is to take Zero-Riscy, one of the smallest RISC-V cores introduced at IIS [Schiavone2017] and use it as a baseline.
- Designed for shared-memory interaction. The vector coprocessor must be thought from the ground up to communicate with the shared memory of the PULP clusters. Techniques and ideas developed in the context of PULP HW accelerators [Conti2017][Azarkhish2017] can be used to make this more efficient.
- Design for minimum energy. This particular coprocessor will be designed not to maximize performance, but to minimize energy spent on highly data-parallel and vectorizable applications, such as machine learning applications (support vector machines, neural networks). To this end, one of the possibilities is to explore frequency scaling within the coprocessor.
- Compliance to RISC-V guidelines. While the RISC-V Foundation has not ratified yet official specifications for their vector extensions, the design should satisfy at least coarsely the guidelines exposed in [Asanovic2016].
The design will be performed in SystemVerilog, for full compliance with the rest of the PULP design flow.
To work on this project, you will need:
- to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 / VLSI2 courses is recommended
- to have prior knowedge of hardware design and computer architecture - having followed the Advanced System-on-Chip Design course is recommended
Other skills that you might find useful include:
- familiarity with a scripting language for numerical simulation (Python or Matlab or Lua…)
- to be strongly motivated for a difficult but super-cool project
If you want to work on this project, but you think that you do not match some the required skills, we can give you some preliminary exercise to help you fill in the gap.
- Supervision: Davide Schiavone, Francesco Conti
Meetings & Presentations
The students and advisor(s) agree on weekly meetings to discuss all relevant decisions and decide on how to proceed. Of course, additional meetings can be organized to address urgent issues.
Around the middle of the project there is a design review, where senior members of the lab review your work (bring all the relevant information, such as prelim. specifications, block diagrams, synthesis reports, testing strategy, ...) to make sure everything is on track and decide whether further support is necessary. They also make the definite decision on whether the chip is actually manufactured (no reason to worry, if the project is on track) and whether more chip area, a different package, ... is provided. For more details confer to .
At the end of the project, you have to present/defend your work during a 15 min. presentation and 5 min. of discussion as part of the IIS colloquium.
- [Asanovic2016] K. Asanovic, RISC-V Vector Extension proposal 
- [Conti2017] F. Conti et al., An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics 
- [Lee2015] Y. Lee et al., Hwachwa Instruction Set Architecture  and Microarchitecture 
- [Schiavone2017] P. D. Schiavone et al., Slow and Steady Wins the Race? A Comparison of Ultra-Low-Power RISC-V Cores for Internet-of-Things Applications
- The EDA wiki with lots of information on the ETHZ ASIC design flow (internal only) 
- The IIS/DZ coding guidelines