Lluc Alvarez, Nikola Vujic, Lluis Vilanova, Ramon Bertran, Marc Gonzalez, Xavier Martorell, Nacho Navarro, Eduard Ayguade. UPC-DAC-RR-CAP-2011-21 (Grup de Computació d'Altes Prestacions) Hardware-Software Coherence in Hybrid Memory Models . Research Report Departament d'Arquitectura de Computadors (DAC) - UPC, July 2011.


Download paper: Adobe portable document (pdf) PDF

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Current cache coherence protocols limit the scalability of chip multiprocessor (CMP) architectures. The expected increase of the number of cores in next generation CMPs call for an evolution of the memory subsystem. One solution is to introduce a local memory side to the cache hierarchy, forming a hybrid memory model. On the one hand, local memories are more power-efficient than caches and they don’t generate coherence traffic. On the other hand, local memories suffer from poor programmability, so programmers rely on automatic code transformations to operate them. When non-predictable memory access patterns are found compilers do not succeed in generating code that manages the local memory because they require complex memory aliasing analyses. This is caused by the incoherency between the local memory and the cache hierarchy. This paper proposes a coherence protocol for hybrid memory models that allows the compiler to generate code even in the presence of aliasing problems. Coherency is ensured by a simple software/hardware co-design that identifies potentially incoherent memory accesses and diverts them to the correct copy of the data. The coherence protocol doesn’t maintain two coherent copies of the data, so no coherence traffic is generated and the overhead is negligible, 0.2% on average. When compared to traditional cache-based architectures, the hybrid memory model with the proposed coherence protocol achieves an average speedup of 1.5x

