Design of an Area-Optimized Soft-Error Resilient Processing Core for Safety-Critical Systems (1M)
The aim of this project is to develop the cheapest soft error resilient processing core, leveraging the concept of repetition in-time.
- Type: Master Thesis
- Professor: Prof. Dr. L. Benini
Project Description and Objectives
Safety-critical applications, such as autonomous driving, spacecraft applications, medical instrumentations and many others, require sophisticated mechanisms to ensure the correct functionality of the system. From a hardware perspective, spatial redundancy of the computing resources is one affirmed technique to detect and possibly correct eventual soft errors of the system. Processing cores are duplicated (dual-lockstep) or even triplicated (triple-core-lockstep, mainly targeting highly critical domains such as aerospace) and run the same instruction flow, at the same time. The result produced by each core is evaluated, cycle-by-cycle, by a majority voter. If the voter detects discrepancy in the results, it alerts the system that a system failure happened and recovery or reset procedures must be initiated.
However, spatial redundancy is very costly to implement, due to the replication of hardware resources and the addition of the logic to implement the majority voting. This has large impact not only on the area footprint of the SoC, but also on its power consumption, particularly unwanted for class of devices that operate under very tiny area and power budget (e.g. safety-critical IoT applications).
At IIS we want to explore innovative solutions that leverage the concept of repetition-in-time [1,2] (rather than spatial replication) to implement fault-detection (and correction) mechanisms. Instead of replicating hardware resources, we replicate the instruction flow over the time: each instruction (or set of instructions) is executed N times and the results/states produced by the instructions are voted to detect eventual failures. This approach would guarantee the same reliability as spatial redundancy with less area footprint and power consumption, at the cost of reduced execution performance of the application.
The aim of this project is to develop the cheapest soft error resilient processing core, leveraging the concept of repetition in-time. The baseline for the design is a tiny IoT processor which features a 32-bit RISC-V ISA. The resulting resilient processor and design will be compared again state-of-the-art to assess benefits, improvements, drawbacks. Interested students can also investigate applicability of this approach to security-critical systems  or explore more complex architectures to cover also hard errors detection.
To achieve the project's goals, the student is required to complete the following activities:
- Study and definition of soft errors typologies;
- Study repetition-in-time technique;
- Design of repetition-in-time error detection and/or correction mechanisms within the IBEX core ;
- Functional testing and validation;
- Safety and Security assessment of the design;
- Area evaluation through the physical implementation of the core (under close guidance)
The student is required to write a weekly report at the end of each week and send it to his advisors by email. The weekly report aims to briefly summarize the work, progress, and any findings made during the week, plan the actions for the next week, and discuss open questions and points. For software programming benchmarks, we strongly recommend creating a google-sheet and plotting the results to trace your benchmark results.
The student will gain advanced knowledge on safey-critical and security-critical systems, with special focus on soft error resiliency. The student will work on a forefront research topic, within a team of PhDs and post-doc researchers that will support the student in all the project phases. Moreover, the student will practice with most common commercial tools used for hardware design.
- 20% state-of-the-art review
- 50% Hardware design
- 20% functional testing, evaluation and assessment of the design
- 5% physical implementation (under close guidance)
- 5% Final report
- Strong interest for safety-critical and security-critical systems
- Experience with digital design in SystemVerilog as taught in VLSI I
- Good knowledge of C programming for embedded systems
- Experience with Unix commands
- Preferred: Experience with bash and Tcl scripting
- Preferred (not strictly required): Experience with ASIC implementation flow (synthesis) as taught in VLSI II