Personal tools

Efficient Banded Matrix Multiplication for Quantum Transport Simulations

From iis-projects

Jump to: navigation, search

Short Description

In Quantum Transport (QT) simulations, especially those using the Non-Equilibrium Green’s Function (NEGF) formalism, efficient matrix multiplications are crucial for performance. The matrices involved frequently have low density and banded structures. To achieve high performance, one usually employs vendor-optimized libraries of the Basic Linear Algebra Subprograms (BLAS), for example, MKL or OpenBLAS for CPU architectures and cuBLAS or cuSPARSE for NVIDIA GPU microarchitectures. Unfortunately, these libraries optimize common-use cases, e.g., dense-times-dense, sparse-times-dense, and sparse-times-sparse, where the sparse matrices are unstructured. Therefore, using these libraries results in sub-optimal performance and/or much higher memory usage than necessary compared with an implementation considering the structure of banded matrices.

Project Scope

The main scope of this (semester) project is to study current approaches for banded matrix multiplication, develop more efficient algorithms, implement them, and integrate them into a quantum transport package. The project can be extended to a full bachelor's or master’s thesis and is particularly relevant for CSE/RW students. Examples of possible extensions are:

  • Tighter integration with the QT solvers using those operations: is it possible to consider the underlying scientific context to further optimize these multiplications?
  • (More) formal performance study of the implemented operations, e.g., using lower/upper-bound theory: how close to optimal are the developed algorithms?
  • Development of optimized distributed algorithms and performance analysis, e.g., using the alpha-beta model.

Status: Available

Looking for 1 semester/bachelor’s/master’s student.
Interested candidates please contact: Dr. Alexandros Nikolaos Ziogas


  • Sufficient knowledge of at least one of C/C++, Python, preferably both.
  • (Optional) knowledge of BLAS/LAPACK and their optimized implementations (MKL, OpenBLAS, cuBLAS, cuSPARSE)
  • (Optional) knowledge of CUDA.
  • (Optional) knowledge of MPI.


Algorithm development (70%), simulation & analysis (30%)


Mathieu Luisier

↑ top