Boosting the Performance of Shared Memory Multiprocessors

Document type: Journal Articles
Article type: Original article
Peer reviewed: Yes
Full text:
Author(s): Per Stenström, Mats Brorsson, Fredrik Dahlgren, Håkan Grahn, Michel Dubois
Title: Boosting the Performance of Shared Memory Multiprocessors
Journal: Computer : a publication of the IEEE Computer Society
Year: 1997
Volume: 30
Issue: 7
Pagination: 63-70
ISSN: 0018-9162
Publisher: IEEE Computer Society
City: Long Beach, Calif.
Organization: Blekinge Institute of Technology
Department: Dept. of Computer Science and Business Administration (Institutionen för datavetenskap och ekonomi)
Dept. of Computer Science and Business Administration S-372 25 Ronneby
+46 455 780 00
Authors e-mail:
Language: English
Abstract: Shared memory multiprocessors make it practical to convert sequential programs to parallel ones in a variety of applications. An emerging class of shared
memory multiprocessors are nonuniform memory access machines with private caches and a cache coherence protocol. Proposed hardware optimizations
to CC-NUMA machines can shorten the time processors lose because of cache misses and invalidations. The authors look at cost-performance trade-offs
for each of four proposed optimizations: release consistency, adaptive sequential prefetching, migratory sharing detection, and hybrid update/invalidate with a
write cache. The four optimizations differ with respect to which application features they attack, what hardware resources they require, and what constraints
they impose on the application software. The authors measured the degree of performance improvement using the four optimizations in isolation and in
combination, looking at the trade-offs in hardware and programming complexities. Although one combination of the proposed optimizations (prefetching and
migratory sharing detection) can boost a sequentially consistent machine to perform as well as a machine with release consistency, release consistency models
offer significant performance improvements across a broad application domain at little extra complexity in the machine design. Moreover, a combination of
sequential prefetching and hybrid update/invalidate with a write cache cuts the execution time of a sequentially consistent machine by half with fairly modest
changes to the second-level cache and the cache protocol. The authors expect that designers will begin to turn more to the release consistency model.
Subject: Computer Science\Computersystems