Personal tools

Enhancing our DMA Engine with Fault Tolerance

From iis-projects

Revision as of 16:02, 17 January 2022 by Michaero (talk | contribs)
Jump to: navigation, search


Overview

Status: Available

Introduction

At IIS we are developing a modular and extensible high-performance direct memory access (DMA) engine. This DMA is integrated into a variety of systems, including PULP cluster systems.

All computing systems, however, are vulnerable to runtime faults (SEUs), especially when deployed in environments with high levels of radiation, such as space. To combat this, a variety of redundancy mechanisms can be employed to ensure correct operation, including Error Detection and Correction (EDC) codes, combined with re-transmission in case of a detected fault, as well as modular redundancy, executing the same operation multiple times, comparing outcomes to ensure correctness.

Project

The goal of the project is to analyze the DMA with regards to fault tolerance, implementing various mechanisms to protect it against SEUs and latent errors in the transferred data. One important element will be EDC on the transferred data, ensuring this remains correct. A second part will focus on hardening the DMA, ensuring SEUs do not affect the correctness of the transfer.

  • Protect Data (EDC)
    • correct data being transferred
    • Ensure correct alignment of EDC in case of byte shifting
    • Properly handle uncorrectable errors in data (e.g. interrupt, refetch)
  • Ensure correct operation of DMA in presence of faults
    • protect internal state (configuration registers, counters) & processing
  • Stretch goal: add AXI5 compliance (parity check) [1]

Character

  • 20% Literature Review
  • 50% Hardware Design
  • 30% Evaluation & Documentation

Prerequisites

  • Strong interest in computer architecture and memory systems
  • Experience with digital design in SystemVerilog as taught in VLSI I
  • Preferred: Knowledge or experience with AXI and RISC-V

References