Using Modulo Rulers for Optimal Recovery Schemes in Distributed Computing

Document type: Conference Papers
Peer reviewed: Yes
Author(s): Kamilla Klonowska, Lars Lundberg, Håkan Lennerstad, Charlie Svahnberg
Title: Using Modulo Rulers for Optimal Recovery Schemes in Distributed Computing
Conference name: 10th International Symposium, Pacific Rim Dependable Computing (PRDC 2004)
Year: 2004
Pagination: 133-42
ISBN: 0 7695 2076 6
Publisher: IEEE
City: Papeete, Tahiti, French Polynesia
ISI number: 000189450600015
Organization: Blekinge Institute of Technology
Department: School of Engineering - Dept. of Systems and Software Engineering (Sektionen för teknik – avd. för programvarusystem)
School of Engineering S- 372 25 Ronneby
+46 455 38 50 00
http://www.tek.bth.se/
Authors e-mail: kamilla.klonowska@bth.se, lars.lundberg@bth.se, hakan.lennerstad@bth.se, charlie.svahnberg@bth.se
Language: English
Abstract: Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down the load on these computers must be redistributed to other computers in the cluster. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e. we want to optimize the worst-case behavior. We define recovery schemes, which are optimal for a larger number of computers down than in previous results. We also show that the problem of finding optimal recovery schemes for a cluster with n computers corresponds to the mathematical problem of finding the longest sequence of positive integers for which the sum of the sequence and the sums of all subsequences modulo n are unique.
Subject: Computer Science\Computersystems
Keywords: distributed processing, resource allocation, software fault tolerance, system recovery, workstation clusters
Edit