It is sometimes necessary to submit large numbers of jobs, especially serial (single core) jobs for High Throughput Computing (HTC). One way of doing this is by preparing many job submission scripts and submitting them with sbatch one at a time. This can be cumbersome even when scripted. Our resource manager supports Job Arrays, a mechanism to submit many jobs with a single job script and single job submission.
Using Job Arrays
When you invoke sbatch with a Job Array, you will have access to a Job index number stored in the $SLURM_ARRAY_TASK_ID environment variable. This allows a single job script to be used for multiple jobs. Here is a sample script:
This example demonstrates how the input files can be a function of the job index number. To submit the above job using a job array of 100 jobs, for example, you would use sbatch as follows.
In this example the $SLURM_ARRAY_TASK_ID value would be 0 through 99 and would result in 100 jobs in the queue waiting to run. The jobs would appear in the queue in a fashion similar to the following example as shown by squeue:
Therefore, the output will be written to a unique file for each job within the array using the templated SLURM output file names i.e. SLURM-jobID_TASK_ID.out.
It is possible to use Job Arrays with an exotic numbering scheme. For example:
This job submission would result in 12 jobs (indexed 0 through 10 and a job with an index of 50).
Additionally, the array index can be placed in the SLURM batch script via the --array parameter.
Managing Job Arrays
Removing jobs within an array uses the scancel command.
If you want to delete all jobs in the array, use scancel as follows:
JobID in the example above is your JobID number of the array.
If you only want to remove the job indexed by the number 5, then the command would be invoked as: