Personal tools

XNORLAX: Fused XNOR-LATCH Custom-Standard-Cell-Based Processing-in-Memory

From iis-projects

Revision as of 19:29, 21 November 2021 by Caoscar (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Micrograph of a chip containing standard-cell-based PPAC.

Short Description

Traditional hardware architectures separate memory from processing (logic) elements. Unfortunately, the ever-growing gap between computing performance and memory access times has led such traditional architectures to hit a “memory wall,” where most of the computations’ time, energy, and bandwidth is consumed by memory operations. This problem is further aggravated with the rise of applications, such as machine learning, data mining, or 5G wireless communication, in which massive amounts of data need to be processed at high rates.

Processing-in-memory (PIM) is an emerging hardware paradigm that proposes to move the memory elements closer to the computation elements in order to break through the memory wall. The concept of PIM has been explored in many recent papers, but most of these works have focused on analog computation and/or emerging semiconductor devices. As such, these architectures are often (i) difficult to design, test, or migrate to other technology nodes, due to their analog component, and/or (ii) not practical today, due to the use of an immature semiconductor technology.

Recently, we have proposed PPAC (Parallel Processor in Associative Content-Addressable Memory) [1], a PIM architecture that is able to accelerate several operations that have a structure similar to matrix-vector products. Unlike other PIM architectures, PPAC is completely digital and implemented using standard-cells only, which makes it very easy to design, implement, and test. We have shown that PPAC achieves an energy-efficiency that is comparable to that of PIM designs that use analog computation, and furthermore, that it can achieve better area- and energy-efficiency than traditional digital architectures that perform the same operation. These results demonstrate that we can reap the benefits from PIM even with technology that is commercially available.

However, our original implementation of PPAC used generic standard cells that have not been optimized for the operations at hand. In order to achieve better area-efficiency, we recently created and characterized a semi-custom standard cell which merges a latch with an XNOR, which is the main structure composing PPAC’s memory. Specifically, our semi-custom standard cell implemented an XNOR using transmission gates and achieved a significant reduction in area and the area-delay product. However, further area- and energy-efficiency improvements can be attained by using other logic styles.

The goal of this project is to design a new custom standard cell that will further improve the area- and energy-efficiency of PPAC. To do so, the student first learns how to design custom standard cells and how to characterize them so that they can be integrated with a standard design flow (synthesis and place-and-route). Then, the student will review different logic styles and choose the most appropriate to outperform transmission gates. Finally, the student will implement custom standard cells in a 65nm CMOS technology node, so that we potentially send a re-designed PPAC ASIC to fabrication.

This project requires knowledge of digital logic and VLSI design. Knowledge of Cadence Virtuoso is preferred (although not absolutely required) and knowledge of Cadence Liberate is a plus.

[1] O. Castañeda, M. Bobbett, A. Gallyas-Sanhueza, and C. Studer, "PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations," IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), July 2019

Status: Available

Looking for a 4-semester student (Master semester or bachelor’s thesis); A master’s thesis could also be considered.
Contact: Oscar Castañeda


VLSI II (recommended)
VLSI III (highly recommended)


10% Exploration
30% VLSI Design
30% Standard-Cell Layout
30% Scripting


Christoph Studer

↑ top

Detailed Task Description


Practical Details



↑ top