Scheduling Policies for Processor Coallocation in Multicluster Systems
Authors: Anca I.D. Bucur and Dick H.J. Epema - The Netherlands
Complete Citation
- Anca I.D. Bucur and Dick H.J. Epema. Scheduling Policies for Processor Coallocation in Multicluster Systems. IEEE Transactions on Parallel and Distributed Systems, 18(7), 2007.
Abstract
Building multicluster systems out of multiple, geographically distributed clusters interconnected by high-speed wide-area networks can provide access to a larger computational power and to a wider range of resources. Jobs running on multiclusters and, more generally, in grids, may require (processor) coallocation, i.e., the simultaneous allocation of resources (processors) in different clusters or subsystems of a grid. In this paper, we propose four scheduling policies for processor coallocation in multiclusters, and we assess with simulations their performance under a wide variety of parameter settings. In particular, in our simulations we use synthetic workloads and workloads derived from the logs of actual systems and from runtime measurements. We conclude that although coallocation makes scheduling more difficult and the wide-area communication critically impacts the performance, there is a wide range of realistic applications that may benefit from coallocation. However, unrestricted coallocation is not recommended: Limiting the total job size or the number or the sizes of their components improves performance.
Annotations
The authors analyze several scheduling policies for multicluster systems. Motivated by the Distributed Advanced School for Computing and Imaging (ASCI) Supercomputer (DAS). Tests are performed via simulation based on idealized job models and traces from the DAS. Scheduling policies include:
- GS: Unified global job queue.
- LS: Distributed local job queues.
- LS-??: Must determine queue enabling order. (Poorly described).
- GP: Global and local queues. Local queues enabled iff the global queue is empty.
- LP: Global and local queues.
- LP-LF: Local queues are enabled first.
- LP-GF: Global queues are enabled first.
- LP-RD: Choose randomly.
When a queue is enabled it schedules jobs according to Worst Fit (WF).
Conclusions: LP-LF kills the global queue. LP-GF works well and does not affect local queue performance too much.
Tags: Multicluster systems, coallocation, scheduling policies, simulation.
Related Work
- H. Bal et al. The Distributed ASCI Supercomputer Project. ACM Operating Systems Rev., 34(4), 2000.
- C. Ernemann, V. Hamscher, U. Schwiegelshohn, R. Yahyapour, and A. Streit. On Advantages of Grid Computing for Parallel Job Scheduling. CCGrid 2002. (When to use processor coallocation.)
- Read up on DUROC (Globus component for coallocation) and KOALA (Scheduler component that arises from this work).
--
JustinWozniak - 13 Jun 2007