논문 정리 Denovo:rethinking the memory hierarchy for disciplined parallelism(PACT 2011)

제목

Denovo:rethinking the memory hierarchy for disciplined parallelism

저자

Byn Choi, Rakesh Komuravelli, Hyojin Sung, Robert Smolinski, Nima Honarmand, Sarita V Adve, Vikram Sadanand Adve, Nicholas P. Carter, Ching Tsun Chou

개인적으로 느끼는 논문의 insight  : Disciplined program의 특징을 이용하여 shared memory에서의 light-weight protocol을 재정의 

Motivation

  • For parallelism to become tractable for mass programmers
  • Previous shared memory model fundamentally broken for hardware and software
  • Propose greatly simplifies cache coherence and consistency, while enabling a more efficient communication and cache architecture

Contributions

  • Simplicity
    • Compared protocol complexity with MESI protocol
    • 25x less reachable states with model checking than MESI
  • Extensible
    • Direct cache-to cache transfer
    • Flexible communication granularity
  • Storage overhead
    • No storage overhead for directory information
    • Storage overheads beat MESI after tens of cores and scale beyond
  • Performance/Power
    • Up to 73% reduction in memory stall time
    • Up to 70% reduction in network traffic

Cache Coherence

  • Coherence Enforcement
    • Invalidate stale copies
    • Track up-to-date copy
  • Explicit Effects
    • Compiler knows all regions written in this parallel phase(DPJ)
    • Cache can self-invalidate before next parallel phase
    • Invalidates data in writeable region not accessed by itself
  • Registration
    • Directory keep track of one up-to-date copy
    • Writer updates before next parallel phase

Results

  • Simulation Environment
    • Wisconsin GEMS+ Simics + Princeton Garent n/w
  • System parameters
    • 64cores
    • private L1(128Kb) and Unified L2(32Mb)
  • Simple core model
    • 5-stage, one-issue, in-order core
    • Results for only memory stall time
  • Benchmark FFT and LU from SPLASH-2
  • kdTree
    • kdFalse: false sharing in auxiliary structure
    • kdPad: padding to eliminated false sharing
Dword(DeNovo) comparable to Mword(MESI)
Simplicity doesn’t compromise performance
Ddirect(direct cache to cache) reduces remote L1 hit time
Dline not susceptible to false-sharing
Up to 66% reduction in total time
Application dependent benefit
kdFalse – Mline worse by 12%
Dflex(flexible communication granularity) outperforms all systems
Up to 73% deduction over Mline
Dline and Dflex less n/w traffic than Mline
Up to 70% reduction

references

https://ieeexplore.ieee.org/document/6113797
https://slideplayer.com/slide/3715109/

댓글을 남겨주세요~