Arnold: Developing efficient IoT data processing applications for a versatile PULPissimo-based SoC in 22nm FDSOI
With the growth of smart sensors being part of everyone’s everyday life, data driven applications are acquiring more and more relevance in the electronics consumer market. Smartwatches for fitness tracking, camera for security and multimedia entertaining as well as biomedical devices as ECG and EEG wearable devices for health care applications are just few of these examples. Typically, the data streams coming from sensors are processed on servers in the cloud. This requires the data to be sensed by a physical device driven by a microcontroller, possibly pre-processed and eventually sent to the network wirelessly (as using Bluetooth low power WiFi radios) where the packet goes through router and switches until it finally arrives to the server in the cloud which will process it and possibly give feedbacks to the users or to the microcontroller for closed-loop applications. As these smart-sensors are usually battery-powered, they are designed to be energy efficient. Most of the power is spent in transmitting the data from the radio to the server, therefore minimizing the transmitted bandwidth towards the servers does not only help to minimize the traffic and congestions, but it also helps the smart-sensors to live longer.
Classification and/or data compression are data processing algorithms that can be used to cope with the aforementioned challenge. As for example, one can imagine an application for face recognition built as following: an ultra-low-power camera continuously acquires images, the microcontroller can compress the image and send less bytes to the server which will simply decompress the data to perform a convolutional neural network to classify the acquired face. Another smarter example still built on a face recognition application is the following: the microcontroller performs a pre-classification on the image to recognize whether the picture is a face or not. In this case, only a small part of the algorithm is needed with respect the whole face recognition process. If the picture is a face, the image is then sent to the cloud saving both on-node power due to the limited access to the radio device and server resources, as they now execute face recognitions algorithms only on certain events.
The event-driven execution paradigma can be also applied at microscopic level by shutting down parts of the microcontroller which are not used during some sort of pre-processing and turning them up only for detected events.
Hardware Accelerators in microcontrollers have shown to boost performance and energy efficiency. Typical accelerators in IoT devices speed up kernels like convoltutions or simply the multiply and accumulate operation . However, as programmable microcontrollers are general-purpose, having the possibility to use a versatile accelerator would benefits more kernels in the signal processing domain.
For this reason, we implemented a PULPissimo based SoC in 22FDSOI technology called 'Arnold' which implement an embedded FPGA (eFPGA) which can be programmed to support any kind of kernel. The eFPGA shares the main memory with the processor, can generate interrupts and it has the possibility to drive GPIOs, which is required to process data from the peripherals to the main memory.
The name Arnold, comes actually from Arnold Schwarzenegger. The lead designer Davide says that just as Arnold Schwarzenegger was successful in multiple unrelated roles as an athlete, a movie star, an entrepeneur and a politician, the Arnold chip, thanks to the eFPGA, will be able to fulfil multiple roles as well .
Do you want to challenge yourself developing applications for the latest implementation of PULPissimo with an eFPGA in 22FDSOI?
The student task is to develope and evaluate the performance, power and energy consumption of selected IoT applications (with cameras or ADCs) for the Arnold chip and compare the results with the pure software based approach, i.e. when the core is in charge to execute the whole application with no assistance from the eFPGA.
The student tasks can be summarized as following:
- Identifying kernels in applications involving image and signal processing that can be efficiently implemented in the eFPGA. This task requires also to define the programming model to let the core and the accelerator built in the eFPGA work together.
- Implementing those kernels in the eFPGA and developing the application for Arnold. Note that the application is an end-to-end implementation, meaning that not only the accelerator part is implemented but also the acquisition of such signals/images from sensors. This part could also be implemented in HW in the eFPGA and it will be part of the evaluation of standard against eFPGA approach.
- Collecting results from the aforementioned points and prepare the presentation and report.
To work on this project, you will need:
- to have worked in the past with at least one RTL language (SystemVerilog or Verilog or VHDL) - having followed the VLSI1 / VLSI2 courses is recommended
- to have prior knowedge of hardware design and computer architecture and FPGA physical design
Other skills that you might find useful include:
- to be strongly motivated for a difficult but super-cool project
If you want to work on this project, but you think that you do not match some the required skills, we can give you some preliminary exercise to help you fill in the gap.