Skip to end of metadata
Go to start of metadata

The batch job scheduling system implemented on this system uses SLURM. SLURM is responsible for resource management, job scheduling, and monitoring. 

Fairshare Scheduling Policy

We implement the SLURM Fairshare feature to provide a fair utilization of the available resources.  This is accomplished by allowing historical resource utilization information to be incorporated into job feasibility and priority decisions. This is normally the most significant component of a job's priority, which ultimately defines the position of the job on a queue. We do not use a FIFO (First-In-First-Out) scheduler.  Your jobs' priority will be determined by your utilization over the past seven days (sliding window), with high utilization resulting in lower priority for new jobs.

Backfill Scheduling Policy

This is a scheduling optimization which allows SLURM to make better use of available resources by running jobs out of order. Using job data such as walltime and resources requested, the scheduler can start other, lower-priority jobs so long as they do not delay the highest priority jobs.  Because of the way it works, essentially filling in holes in node space, backfill tends to favor smaller and shorter running jobs more than larger and longer running ones.

Accurate Walltime Improves Scheduling of Jobs

It is important to specify an accurate walltime for your job in your SLURM submission script.  Selecting the default walltime for jobs that are known to run for less time may result in the job being delayed by the scheduler due to an overestimation of the time the job needs to run.

Automatic Queue Routing

Each of our compute resources has a pre-defined default queue.  If you submit your job without specifying a queue, your job will be automatically routed to the default queue.  Therefore, be aware of which queue you intend for the job to run in and specify this queue in your SLURM batch script.