| Title | : | Harnessing Locality for Enhanced Application Performance |
| Speaker | : | S R Swamy Saranam Chongala (IIT Madras) |
| Details | : | Mon, 8 Dec, 2025 11:00 AM @ MR1 |
| Abstract: | : | Modern applications depend on architectures that effectively exploit data locality, but growing working-set sizes have outpaced the scalability of SRAM-based caches. As a result, DRAM-based last-level caches are increasingly used, despite challenges such as large tag storage, higher access latency, and greater power consumption. This thesis addresses these limitations by proposing three techniques that leverage temporal, spatial, and algorithmic locality across the memory hierarchy and heterogeneous SoCs. First, we propose tagless DRAM cache (TDC) to mitigate the overhead of the tag array in DRAM caches and reduce average access latency. Our technique exploits the spatial locality exhibited by applications to design efficient large LLCs with minimal metadata overhead, while improving access latency and power consumption. In the second technique, we propose novel cache management strategies that decouple L3-L4 replacement decisions to reduce inter-cache interference. We introduce energy-efficient mechanisms tailored for tagless last-level caches (LLCs): restricted block caching (RBC) and victim tag buffer caching (VBC). These mechanisms incorporate L4 eviction penalties into L3 replacement logic with minimal overhead. Finally, building on the insights from locality exploitation in caches, we propose micro-architectural changes to efficiently harness algorithmic locality in SoCs with heterogeneous compute. Our technique aims to optimize data movement between CPUs and NPUs. We introduce a hybrid processing unit (HPU), a novel architecture that tightly integrates NPUs and CPUs. The HPU combines the strengths of both processors to improve data movement and performance for AI applications. |
