Dynamic grid scheduling with job migration and rescheduling in the GridLab resource management system
Authors: K. Kurowski, B. Ludwiczak, J. Nabrzyski, A. Oleksiak and J. Pukacki - Poland
Complete Citation
- K. Kurowski, B. Ludwiczak, J. Nabrzyski, A. Oleksiak and J. Pukacki. Dynamic grid scheduling with job migration and rescheduling in the GridLab? resource management system. Scientific Programming, 12(4), 2004.
Abstract
Grid computing has become one of the most important research topics that appeared in the field of computing in the last years. Simultaneously, we have noticed the growing popularity of new Web-based technologies which allow us to create application-oriented Grid middleware services providing capabilities required for dynamic resource and job management, monitoring, security, etc. Consequently, end users are able to get easier access to geographically distributed resources. In this paper we present the results of our experiments with the Grid(Lab) Resource Management System (GRMS), which acts on behalf of end users and controls their computations efficiently using distributed heterogeneous resources. We show how resource
matching techniques used within GRMS can be improved by the use of a job migration based rescheduling policy. The main aim of this policy is to shorten job pending times and reduce machine overloads. The influence of this method on application performance and resource utilization is studied in detail and compared with two other simple policies.
Annotations
The authors describe a new metascheduler called the Grid(Lab) Resource Management System (GRMS). The system functions as a feedback controller for job scheduling. An important central component is the Broker Module, which may modified by inserting new policy plug-ins. The major investigation in this paper is the
Reschedule plug-in, which relaxes job requirements to attempt to squeeze more jobs onto limited resources. Application-level checkpointing is used.
The authors run three rescheduling policies against each other:
- Wait - Queue extra jobs (a control case).
- Overload - Submit more jobs than resources can normally handle.
- Reschedule - Apply the adaptive rescheduling policy.
Reschedule narrowly beat Wait in the makespan experiments. Overload gave predictably bad results.
--
JustinWozniak - 14 Nov 2007