Scheduling Policies for Processor Coallocation in Multicluster Systems

Authors: Anca I.D. Bucur and Dick H.J. Epema - The Netherlands

Complete Citation

  • Anca I.D. Bucur and Dick H.J. Epema. Scheduling Policies for Processor Coallocation in Multicluster Systems. IEEE Transactions on Parallel and Distributed Systems, 18(7), 2007.

Abstract

Building multicluster systems out of multiple, geographically distributed clusters interconnected by high-speed wide-area networks can provide access to a larger computational power and to a wider range of resources. Jobs running on multiclusters and, more generally, in grids, may require (processor) coallocation, i.e., the simultaneous allocation of resources (processors) in different clusters or subsystems of a grid. In this paper, we propose four scheduling policies for processor coallocation in multiclusters, and we assess with simulations their performance under a wide variety of parameter settings. In particular, in our simulations we use synthetic workloads and workloads derived from the logs of actual systems and from runtime measurements. We conclude that although coallocation makes scheduling more difficult and the wide-area communication critically impacts the performance, there is a wide range of realistic applications that may benefit from coallocation. However, unrestricted coallocation is not recommended: Limiting the total job size or the number or the sizes of their components improves performance.

Annotations

The authors analyze several scheduling policies for multicluster systems. Motivated by the Distributed Advanced School for Computing and Imaging (ASCI) Supercomputer (DAS). Tests are performed via simulation based on idealized job models and traces from the DAS. Scheduling policies include:

  • GS: Unified global job queue.
  • LS: Distributed local job queues.
    • LS-??: Must determine queue enabling order. (Poorly described).
  • GP: Global and local queues. Local queues enabled iff the global queue is empty.
  • LP: Global and local queues.
    • LP-LF: Local queues are enabled first.
    • LP-GF: Global queues are enabled first.
    • LP-RD: Choose randomly.

When a queue is enabled it schedules jobs according to Worst Fit (WF).

Conclusions: LP-LF kills the global queue. LP-GF works well and does not affect local queue performance too much.

Tags: Multicluster systems, coallocation, scheduling policies, simulation.

Related Work

  • H. Bal et al. The Distributed ASCI Supercomputer Project. ACM Operating Systems Rev., 34(4), 2000.
  • C. Ernemann, V. Hamscher, U. Schwiegelshohn, R. Yahyapour, and A. Streit. On Advantages of Grid Computing for Parallel Job Scheduling. CCGrid 2002. (When to use processor coallocation.)
  • Read up on DUROC (Globus component for coallocation) and KOALA (Scheduler component that arises from this work).

-- JustinWozniak - 13 Jun 2007

Topic attachments
I Attachment Action Size Date Who Comment
pngpng DAS.png manage 88.4 K 13 Jun 2007 - 14:53 JustinWozniak  
Topic revision: r2 - 19 Jun 2007 - 02:04:26 - JustinWozniak
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback