The old environment modules tool is being replaced by a new implementation named Lmod and the modulefile hierarchy is also changing. Most of the old commands will still work as before, however there are a few new features to be aware of.
module avail' no longer lists all available modules, only a list of modules available with the currently loaded modules. To see all available applications, use the spider sub command:
To see a description of a specific package, use the spider sub command again:
To load the module for OpenMPI built with the GCC compilers, for example, use the load sub command:
To see a list of modules that you have loaded, use this command:
To change to the Intel compiler build of OpenMPI use the swap sub command:
To unload all of your modules, use this command:
To create a default module list and load it automatically every time you log in use the module save command:
The saved modules will be loaded automatically every time you log in or when you run the '
module restore' command.
The torque/moab resource manager/job scheduler combination is being replaced by SLURM.
The batch job scheduling system implemented on this system uses SLURM. SLURM is responsible for resource management, job scheduling, and monitoring.
Fairshare Scheduling Policy
We implement the SLURM Fairshare feature to provide a fair utilization of the available resources. This is accomplished by allowing historical resource utilization information to be incorporated into job feasibility and priority decisions. This is normally the most significant component of a job's priority, which ultimately defines the position of the job on a queue. We do not use a FIFO (First-In-First-Out) scheduler. Your jobs' priority will be determined by your utilization over the past seven days (sliding window), with high utilization resulting in lower priority for new jobs.
Backfill Scheduling Policy
This is a scheduling optimization which allows SLURM to make better use of available resources by running jobs out of order. Using job data such as walltime and resources requested, the scheduler can start other, lower-priority jobs so long as they do not delay the highest priority jobs. Because of the way it works, essentially filling in holes in node space, backfill tends to favor smaller and shorter running jobs more than larger and longer running ones.
Automatic Queue Routing
Each of our compute resources has a pre-defined default queue. If you submit your job without specifying a queue, your job will be automatically routed to the default queue. Therefore, be aware of which queue you intend for the job to run in and specify this queue in your SLURM batch script.
Available Partitions and System Load
The definition of the queues are as follows:
commons - default partition, intended for all jobs.
serial_long - partition intended for serial jobs needing longer than 24 hours but less than 72 hours (3 days).
interactive - partition intended for interactive jobs or short running testing jobs.
Use the following command to determine the partitions with which you have access. Please note in the output the Account column information needs to be provided to your batch script in addition to the partition information.
Determining Partition Status
A good way to obtain the status of all partitions and their current usage is to run the following SLURM command:
Here is a brief description of the relevant fields:
PARTITION: Name of a partition. Node that the suffix "*" identifies the default partition.
AVAIL: Partition state: up or down.
TIMELIMIT: Maximum time limit for an user job in days-hours:minutes:seconds.
NODES: Count of nodes with this particular configuration by node state in the form "[A]vailable/[I]dle/[O]ther/[T]otal
STATE: State of the nodes.
NODELIST: Names of nodes associated with this configuration/partition.
See the manpage for sinfo for more information
Once you have an executable program and are ready to run it on the compute nodes, you must create a job script that performs the following functions:
- Use job batch options to request the resources that will be needed (i.e. number of processors, run time, etc.), and
- Use commands to prepare for execution of the executable (i.e. cd to working directory, source shell environment files, copy input data to a scratch location, copy needed output off of scratch location, clean up scratch files, etc).
After the job script has been constructed you must submit it to the job scheduler for execution. The remainder of this section will describe the anatomy of a job script and how to submit and monitor jobs.
SLURM Batch Script Options
All jobs must be submitted via a SLURM batch script or invoking sbatch at the command line . See the table below for SLURM submission options.
Recommended: Assigns a job name. The default is the name of SLURM job script.
Recommended: Specify the name of the Partition (queue) to use. Use this to specify the default partition or a special partition i.e. non-condo partiton with which you have access.
Recommended: The number of tasks per job. Usually used for MPI jobs.
You can get further explanation here .
|#SBATCH --nodes=2||Recommended: The number of nodes requested.|
You can get further explanation here .
|#SBATCH --ntasks-per-node=12||Recommended: The number of tasks per node. Usually used in combination with --nodes for MPI jobs.|
You can get further explanation here .
|#SBATCH --cpus-per-task=16||Recommended: The number processes per task. Usually used for OpenMP or multi-threaded jobs.|
Required: The maximum run time needed for this job to run, in days-hh:mm:ss.
Recommended: The maximum amount of physical memory used by any single process of the job in megabytes.
See our FAQ for more details.
|#SBATCH --mail-user=YourEmailAddress||Recommended: Email address for job status messages.|
|#SBATCH --mail-type=ALL||Recommended: SLURM will notify the user via email when the job reaches the following states BEGIN, END, FAIL or REQUEUE.|
|#SBATCH --gres=gpu:1||Optional: Request the number of GPUs per node|
|#SBATCH --nodes=1 --exclusive||Optional: Using both of these options will give your job exclusive access to a node such that no other jobs can share the node. |
This combination of arguments will assign all tasks to your job and will give it exclusive access to all of the resources
(i.e. memory) of the entire node without interference from other jobs. Please see our FAQ for more details on exclusive access.
Optional: The full path for the standard output (stdout) and standard error (stderr) "slurm-%j.out" file, where the "%j" is replaced by the job ID. Current working directory is the default.
Optional: The full path for the standard error (stderr) "slurm-%j.out" files. Use this only when you want to separate (stderr) from (stdout). Current working directory is the default.
|Optional: Exports all environment variables to the job. See our FAQ for details.|
You need to specify both the name of the condo account and partition to use a condo on the cluster.
Use the command sacctmgr show assoc user=netID to show which accounts and partitions with which you have access.
Serial Job Script
A job script may consist of SLURM directives, comments and executable statements. A SLURM directive provides a way of specifying job attributes in addition to the command line options. For example, we could create a myjob.slurm script this way:
This example script will submit a job to the default partition using 1 task, 1GB of memory per processor core, with a maximum run time of 30 minutes.
If you need to debug your program and want to run in interactive mode, the same request above could be constructed like this (via the srun command):
For more details on interactive jobs, please see our FAQ on this topic.
SLURM Environment Variables in Job Scripts
When you submit a job, it will inherit several environment variables that are automatically set by SLURM. These environment variables can be useful in your job submission scripts as seen in the examples above. A summary of the most important variables are presented in the table below.
Location of shared scratch space. See our FAQ for more details.
|$LOCAL_SCRATCH||Location of local scratch space on each node.|
Environment variable containing a list of all nodes assigned to the job.
Path from where the job was submitted.
Job Launchers (srun)
For all jobs run on the cluster we require that you use srun to launch your job. The job launcher's purpose is to spawn copies of your executable across the resources allocated to your job. By default srun only needs your executable, the rest of the information will be extracted from SLURM.
The following is an example of how to use srun inside your SLURM batch script. This example will run myMPIprogram as a parallel MPI code on all of the processors allocated to your job by SLURM:
This example script will submit a job to the default partition using 24 processor cores and 1GB of memory per processor core, with a maximum run time of 30 minutes.
The following example will run myMPIprogram on only four processors even if your batch script requested more than four.
Submitting and Monitoring Jobs
Once your job script is ready, use sbatch to submit it as follows:
This will return a jobID number while the output and error stream of the job will be saved to one file inside the directory where the job was submitted, unless you specified otherwise.
The status of the job can be obtained using SLURM commands. See the table below for a list of commands:
Show a detailed list of all submitted jobs.
squeue -j jobID
Show a detailed description of the job given by jobID.
squeue -- start -j jobID
Gives an estimate of the expected start time of the job given by jobID.
There are variations to these commands that can also be useful. They are described below:
Show a list of all running jobs.
squeue -u username
Show a list of all jobs in queue owned by the user specified by username.
scontrol show job jobID
To get a verbose description of the job given by jobID. The output can be used as a template when you are attempting to modify a job.
There are many different states that a job can be after submission: BOOT_FAIL (BF), CANCELLED (CA), COMPLETED (CD), CONFIGURING (CF), COMPLETING (CG), FAILED (F), NODE_FAIL (NF), PENDING (PD), PREEMPTED (PR), RUNNING (R), SUSPENDED (S), TIMEOUT (TO), or SPECIAL_EXIT (SE). The squeue command with no arguments will list all jobs in their current state. The most common states are described below.
Running (R): These are jobs that are running.
Pending (PD): These jobs are eligible to run but there is simply not enough resources to allocate to them at this time.
A job can be deleted by using the scancel command as follows: