Comparing the optimal performance of parallel architectures

Document type: Journal Articles
Article type: Original article
Peer reviewed: Yes
Author(s): Kamilla Klonowska, Lars Lundberg, Håkan Lennerstad, Magnus Broberg
Title: Comparing the optimal performance of parallel architectures
Translated title: Jämförelser av optimala prestanda för parallella arkitekturer
Journal: Computer Journal
Year: 2004
Volume: 47
Issue: 5
Pagination: 527-544
ISSN: 0010-4620
Publisher: Oxford University Press
City: Oxford
URI/DOI: 10.1093/comjnl/47.5.527
ISI number: 000223426300002
Organization: Blekinge Institute of Technology
Department: School of Engineering - Dept. of Systems and Software Engineering (Sektionen för teknik – avd. för programvarusystem)
School of Engineering S- 372 25 Ronneby
+46 455 38 50 00
http://www.tek.bth.se/
Authors e-mail: kkl@bth.se, llu@bth.se, hln@bth.se, mbr@bth.se
Language: English
Abstract: Consider a parallel program with n processes and a synchronization granularity z. Consider also two parallel architectures: an SMP with q processors and run-time reallocation
of processes to processors, and a distributed system (or cluster) with k processors and no run-time reallocation. There is an inter-processor communication delay of t time units for the system with no run-time reallocation. In this paper we define a function H(n,k,q,t,z) such that the minimum completion time for all programs with n processes and a granularity z is at most H(n,k,q,t,z) times longer using the system with no reallocation and k processors compared to using the system with q processors and run-time reallocation. We assume optimal allocation and scheduling of processes
to processors. The function H(n,k,q,t,z)is optimal in the sense that there is at least one program, with n processes and a granularity z, such that the ratio is exactly H(n,k,q,t,z). We also validate our results using measurements on distributed and multiprocessor Sun/Solaris environments. The function H(n,k,q,t,z) provides important insights regarding the performance implications of the fundamental design decision of whether to allow run-time reallocation of processes or not. These insights
can be used when doing the proper cost/benefit trade-offs when designing parallel execution platforms.
Summary in Swedish: Vi betraktar ett parallellt program med n processer och synkroniseringsgranularitet z, samt två parallella arkitekturer. Det första har q processorer och full allokering av processerna är tillåten, och det andra har k processorer och ingen reallokering under körningen. Varje reallokering tar t sekunder. Vi definierar en funktion H(n,k,q,t,z) så att körtiden för ett program med n processer och granularitet z är högst en faktor H(n,k,q,t,z) längre för systemet utan reallokering än för systemed med. Vi antar optimal allokering av processer i de två systemen. Funktionen är optimal - det finns program där körtiden är exakt H(n,k,q,t,z) gånger längre. Resultaten valideras med mätningar på multiprocessorer i Sun/Solaris miljö.
Subject: Computer Science\Distributed Computing
Mathematics\Discrete Mathematics
Computer Science\Computersystems
Keywords: multiprocessor, parallel computing, allocation, performance, granularity, synchronization
Note: Computer Journal, 47(5): 527-544 (2004), http://www.informatik.uni-trier.de/~ley/db/journals/cj/cj47.html#KlonowskaLB04
Edit