On Cache Sizing in Multi-Core Cluster Architectures


(*) Corresponding author


Authors' affiliations


DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract


With multi-core products hitting every market segment, we focus on a 16 core cluster-based architecture comprised of 4 clusters. Assuming the L1 cache memories are private, we conduct SPEC92 simulations to study the effect of the block  size, associativity, and cache size on the L1 cache hit rate. We observed significant drops in hit rate as the number of processors per cluster was increased from 1 to 4 with the largest drop occurring when boosting the number of processors from 1 to 2.  This result leads to discouraging the design of small clusters with less than 4 cores per cluster and confirms that 4-core clusters are properly sized. In caches of at least 16KB and with 1024 blocks, we observed no significant benefit to increasing the associativity beyond 8, and identified the best block sizes to be 256B and 128B.  Furthermore, hit rate improvements due to quadrupling the number of blocks and consequently the cache size were insignificant for block sizes of 64B or higher in caches with at least 256 blocks. For the SPEC92 benchmark, a 128KB 8-way set associative L1 cache with a  block  size of 64B  is adequate.
Copyright © 2016 Praise Worthy Prize - All rights reserved.

Keywords


Multi-Core Architectures; 4-Way CMP Clusters; Cache Performance

Full Text:

PDF


References


L. Hammond, B. Nayfeh, and K. Olukotun, A Single-Chip Multiprocessor, IEEE Computer, Vol. 30, n. 9, pp. 7-85, September 1997.

R. Kumar, D. Tullsen, and N. Jouppi, Heterogeneous Chip Multiprocessors, IEEE Computer, , pp. 32-38, November 2005.

P. Dubey, CMP Challenges, ICCD Panel, IEEE International Conference on Computer Design (Year of Publication: 2005).

M. Rodrigez, SMPCache 2.0 simulator, http://arco.unex.es/ smpcache/

S. Hily, and A. Seznec., Contention on 2nd Level Cache May Limit The Effectiveness of Simultaneous Multithreading, IRISA INRIA Internal Report n. 1086, 1997.

C. Kim, D. Burger, and S. Keckler, An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches, Proc. ASPLOS (Year of Publication: 2002).

M. Zhang, and K. Asanovic, Victim Replication: Maximizing Capacity While Hiding Wire Delay In Tiled Chip Multiprocessors, Proc. of International Symposium on Computer Architecture (Year of Publication: 2005).

Jichuan Chang and Gurindar S. Sohi, Cooperative Caching for Chip Multiprocessors, Proc. of International Symposium on Computer Architecture (Year of Publication: 2006).

Z. Chishti, M. Powell, and T. N. Vijaykumar. Distance AssociativityFor High-Performance Energy-Efficient Non-Uniform Cache Architecture, Proc. 36th Annual International Symposium on Microarchitecture (MICRO, Year of Publication: 2003).

B. Nayfeh, K. Olukotun, and J. P. Singh, The Impact of Shared-Cache Clustering In Small-Scale Shared-Memory Multiprocessors, Proc. of IEEE International Symposium on High-Performance Computer Architecture (Year of Publication: 1996).

B. M. Beckmann, and D. A. Wood, Managing Wire Delay In Large Chip-Multiprocessor Caches. Proc. 37th International Symposium on Microarchitecture (MICRO, Year of Publication: 2004).

Lei Jin, Hyunjin Lee, and Sangyeun Cho, A Flexible Data to L2 Cache Mapping Approach for Future Multicore Processors. Proc. ACM MSPC (Year pf Publication: 2006).

G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory, Journal of Supercomputing, Vol. 28, n. 1, pp.7–26, 2004.

Jaehyuk Huh, Changkyu Kim, Hazim Shafi, Lixin Zhang, Doug Berger, and Stephen Keckler, A NUCA Substrate for Flexible CMPCache Sharing. Proc. 19th ACM Conf. on Supercomputing (Year of Publication: 2005).

S. Kim, D. Chandra, and Y. Solihin, Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture, Proc of the 13th IEEE International Conference on Parallel Architecture and Compilation techniques (PACT04, Year of Publication: 2004).

D. Chandra, F. Guo, S. Kim, and Y. Solihin, Predicting Inter-thread Cache Contention on a Chip Multi-Processor Architecture, Proc of the 11th IEEE International Symposium on High-Performance Computer Architecture (HPCA-11, Year of Publication: 2005).

L. Hsu, R. Iyer, S. Makineni, S. Reinhardt, and D. Newell, Exploring the Cache Design Space for Large Scale CMPs, ACM SIGARCH Computer Architecture News, Vol. 33, n. 4, pp. 24-33, September 2005.

R. Kumar, V. Zyuban, and D. Tullsen, Interconnections in Multi- Core Architectures: Understanding Mechanisms, Overheads and Scaling, Proc. of 32nd International Symposium on Computer Architecture (Year of Publication: 2005).

SPECint92 Benchmark, http://www.spec.org/cpu92/cint92.ht, ml

SPECfp92 Benchmark, http://www.spec.org/cpu92/cfp92.html


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize