Advanced School for Computing and Imaging (ASCI)

ASCI office
Delft University of Technology
Building 28, room 04.E120
Van Mourik Broekmanweg 6
2628 XE – DELFT, The Netherlands

P: +31 15 27 88032

Visiting hours office
Monday, Tuesday, Thursday: 10:00 – 15:00


The ASCI office is located at the Delft University of Technology campus.  It is easily accessible by bicycle, public transport and car. The numbers of buildings can help you find your way around the campus. Make sure you remember the name and building number of your destination.

Contact us at +31 15 278 8032 or send us an email at

Resource-aware scheduling for 2d/3d multi-/many-core processor-memory systems

Author : Yixian Shen
Promotor(s) : A.D. Pimentel / Dr. Anuj Pathania
University : University of Amsterdam
Year of publication : 2024
Link to repository : Link to thesis


ThMulti-core processors markedly enhance performance and facilitate increased
computational capacity in modern systems. However, an ever-growing number of transistors into 2D/3D chips not only yields performance advantages
but also introduces complex challenges, including shared resource contention and heightened power density without corresponding increases in
thermal dissipation capabilities. Addressing these issues demands a sophisticated understanding of system architecture and resource-aware scheduling. This dissertation concentrates on two critical aspects of 2D/3D multi-
/many-core processor-memory systems. Specifically, we focus on enhancing timing predictability for hard real-time multi-core processors and optimizing performance under thermal constraints for 2D/3D many-core processor memory systems.
In our dedicated pursuit of improving timing predictability, we systematically approach the challenge in two stages. Initially, we extend shared
cache interference analysis for set-associative instruction and data caches
and bolster the state-of-the-art Worst-Case Execution Time (WCET) calculation tool. This enhancement allows us to precisely compute cache interference between two programs and to validate the effectiveness of a partitioned scheduler with shared caches in practical, real-world applications.
Subsequently, we introduce a novel task and cache-aware partitioned scheduler called TCPS. TCPS blends the benefits of cache partitioning with partitioned scheduling, considering the WCET sensitivity specific to each task’s
cache partitioning. This integration leads to increased schedulability performance but also enhances timing predictability. Our exhaustive investigation unveils key design trade-offs in cache configurations (shared versus
partitioned) and scheduling methodologies (global versus partitioned) and
benchmarks them against our scheduling policy. Through extensive experimentation, we explore the multifaceted influences of various parameters
on the schedulability performance of different cache configurations under
real-time schedulers. Our results validate that our TCPS outperforms other
schedulers in terms of schedulability performance and load balance.

Regarding performance optimization under thermal constraints, we embark on three distinct but interconnected initiatives. First, recognizing that
the penalty from Dynamic Voltage and Frequency Scaling (DVFS) on SNUCA many-cores is more significant than that from thread rotations, we
propose thermal management for S-NUCA many-core processors through
synchronous thread migrations. We derive an analytical method for computing the peak temperature of synchronously rotating threads, HotPotato, a
runtime scheduler for enhanced performance while ensuring thermal safety.
Our second initiative unveils 3D-TTP, a transient temperature-aware power
budgeting strategy for 3D-stacked systems, designed to curtail the frequent
activation of Dynamic Thermal Management (DTM). This technique leverages a fine-grained grid-level RC-thermal model, facilitating precise power
budgeting. In our third initiative, we present 3QUTM, the first method tailored for 3D-stacked processor-memory systems. Uniquely unifying core
DVFS and memory bank Low Power Mode (LPM) through a learning-based
algorithm, it seeks to minimize response time within a defined thermal
threshold. These cumulative efforts represent a significant stride toward
performance maximization and thermal safety in the evolving landscape of
2D/3D multi-/many-core processor-memory systems.