This document is still under development and is being written/edited by RCSG and MathWorks staff.  Feedback is welcome and encouraged!

Table of Contents

Introduction

There are several ways in which to submit MATLAB jobs to a cluster.  This document will cover the various ways to run MATLAB compute jobs on the Shared Research Compute clusters, which will include using the Parallel Computing Toolbox (PCT) and the MATLAB Distributed Compute Server (MDCS) to submit many independent tasks as well as to submit a single task that has parallel components.   Examples are included.

MATLAB is not available on the POWER 7 architecture.  Therefore, it is not installed on BlueBioU.

Definitions

Task Parallel Application - The same application that runs independently on several nodes, possibly with different input parameters.  There is no communication, shared data, or synchronization points between the nodes.

Data Parallel Application - The same application that runs on several labs simultaneously, with communication, shared data, or synchronization points between the labs.

Lab - A MATLAB worker in a multicore (Data Parallel) job.  One lab is assigned to one worker (core).  Thus, a job with eight labs has eight processor cores allocated to it and will have eight workers each working together as peers.

MDCS - MATLAB Distributed Compute Server.  This is a component of MATLAB that allows our clusters to run MATLAB jobs that exceed the size of a single compute node (multinode parallel jobs).  It also allows jobs to run even if there are not enough toolbox licenses available for a particular toolbox, so long as the university owns at least one license for the particular toolbox. 

PCT - Parallel Computing Toolbox.

MATLAB Task - One segment of a job to be evaluated by a worker.

MATLAB Job - The complete large-scale operation to perform in MATLAB, composed of a set of tasks.

MATLAB Worker - The MATLAB session that performs the task computations.  If a job needs eight processor cores, then it must have eight workers.

Job - Job submitted via the PBS job scheduler (also called PBS Job).

Data Location

This document assumes that all of your input data files, output files, and MATLAB code files are stored on the cluster's filesystem, not on your desktop.  When you launch the MATLAB GUI and look at your home folder, you will be looking at your home folder on the cluster, not on your desktop.  If your data or code files are stored on your desktop, you will need to transfer them to the cluster first.

Toolbox Licenses

The version of MATLAB installed on our clusters shares a site license with the rest of the campus.  To see which Toolboxes our campus is licensed to use, launch MATLAB interactively on your desktop or on one of our clusters and run the ver command.

The ver command will show you which Toolboxes we are licensed to use.  However, it will not show you the number of licenses (seats) for each toolbox.  In some cases there might be only a single seat for a given toolbox.  This means only one user can use the toolbox at any given time.

Running MATLAB with the GUI for Code Development

if you need to run MATLAB on one of the clusters to develop your code and run short (30 minutes or less) tests of the code, please follow these instructions.

1.  Login to the Cluster

Login to the cluster using our published instructions

Windows users will need to use Xming in order to run the MATLAB GUI on the cluster and have it displayed on their desktop. 

2.  Load the MATLAB Environment

Load the MATLAB module with the following command:

module load matlab/2011a

3.  Run MATLAB

To run MATLAB with our without the GUI as follows:

matlab

matlab -nodisplay


This will start MATLAB with and without the GUI, respectively.

At this point MATLAB will be running interactively on one of the login nodes, not one of the compute nodes. When you get the matlab prompt, start writing your code as your normally would. This method of running MATLAB is intended for code development and for executing short test runs of your code (30 minutes or less).

Do not run full length compute jobs on the login nodes.  Use one of the methods below for submitting a compute job to a job queue.

Submitting Interactive Jobs

If you need to run a MATLAB compute job interactively, please follow these instructions.

1.  Login to the Cluster

Login to the cluster using our published instructions

Windows users will need to use Xming in order to run the MATLAB GUI on the cluster and have it displayed on their desktop. 

2.  Load the MATLAB Environment

Load the MATLAB module with the following command:

module load matlab/2011a

3.  Submit an Interactive PBS Job

Submit an interactive PBS job as follows:

[user@login1 ~]$  qsub -I -X -V -l nodes=1:ppn=8,walltime=1:00:00 -W x=NACCESSPOLICY:SINGLEJOB

Change the ppn and walltime values to suit your job.

When this job starts executing you will receive a prompt on a compute node. The output will look something like the following:

[user@login1 ~]$ qsub -I -X -V -l nodes=1:ppn=8,walltime=1:00:00 -W x=NACCESSPOLICY:SINGLEJOB
qsub: waiting for job 33933.host.rcsg.rice.edu to start
qsub: job 33933.host.rcsg.rice.edu ready
[user@compute98 ~]$

You will notice that the command prompt has changed from a login node designation to a compute node designation.

At present this type of job submission will compete with all other jobs on the system for run time. If no nodes are available then the job submission will pause and a command prompt on a compute node will not be presented until an idle node is available. This should not be a problem during periods when cluster utilization is low. When cluster utilization is high, this wait time could be lengthy.

Job submissions of this type must satisfy the queue policy on each cluster or the job will not execute.  For example, STIC requires that jobs submitted to the default queue request two or more nodes while jobs on Sugar must be one node only.  Consult the documentation for each cluster for the queue policy.

4.  Launch MATLAB

After you have obtained a command line prompt on a compute node, launch MATLAB with or without the GUI. An example without the GUI is shown below.

[user@compute98 ~]$ matlab -nodisplay
                < M A T L A B >
       Copyright 1984-2010 The MathWorks, Inc.
            Version 7.12.0.635 (R2011a) 64-bit (glnxa64)
                  March 18, 2011
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

>>

From the prompt you can run any MATLAB command that you wish.

Your MATLAB  session will be terminated when its run time (walltime) has been reached.

You will need to remain logged into the cluster for this interactive session to execute. If you terminate your login, MATLAB will also terminate. Therefore, using the interactive method is not a good choice for a job that is going to run for many hours.

To run MATLAB with the GUI, simply run the matlab command on the compute node without the -nodisplay option.  The GUI will perform more slowly than if you were running in on your desktop since the graphics are being transmitted from a compute node to your desktop.

Submitting Task Parallel and Data Parallel Jobs with PCT

The Parallel Computing Toolbox (PCT) provides an API that allows you to submit a job that has multiple independent tasks (Task Parallel) or submit a job that has a single task that is a multiprocessor, and possibly multinode, task (Data Parallel).  In order to run this type of job you must first configure MATLAB for this type of job submission by following these steps;

These steps need to be performed only once.  Subsequent sessions of MATLAB do not need to be configured.

Configuring MATLAB

1.  Enable passwordless for your cluster account.

After you have logged into the cluster, follow these instructions to enable passwordless ssh.

2.   In your home directory create the MdcsDataLocation subdirectory.

mkdir ~/MdcsDataLocation

3.  Load the MATLAB 2011a environment:

module load matlab/2011a

4.  Run MATLAB on the login node:

matlab

5.  Import the cluster configuration for the cluster you are running on:

  1. On the MATLAB Desktop menu bar, click Parallel > Manage Configurations.
  2. Click File > Import
  3. In the Import Configuration dialog box, browse to find the MAT-file for the configuration that you want to import.  Navigate to /opt/apps/matlab/2011a/mdcs and select the configuration for the system you are using, such as sugar.mat, davinci.mat, stic.mat, and so forth.  Select the file and click Import.
  4. (Optional) Select the configuration you have imported and click Start Validation.
    1. All four stages should pass except as specified below:  Find Resources, Distributed Job, Parallel Job, Matlabpool

      Validation is expected to fail if any of the following conditions occur:
      1.  If the cluster is busy such that a job submission must wait before it will run then the validation steps will fail.
      2.  Distributed job validation on STIC will always fail due to the system queue policy.
      3.  Parallel job validation appear to take a long time because it will try with the maximum number of workers (currently 128).  A 128 core validation job takes a long time to execute but should not fail unless condition #1 above exists.
      4.  There are insufficient licenses available to run the validation job.  

      A successful validation means that your configuration is correct.  However, validation failure does not mean that your configuration is incorrect.  Most likely one of the above conditions have been encountered.  Either try validating later when the system is less busy or assume that your configuration is correct and proceed to submit one of the sample jobs below.

  5. Select this configuration to be the default configuration and close the Configuration Manager window.

You are ready to submit jobs to MDCS.

Submitting Task Parallel Jobs

The following is an example of a Task Parallel job.  The task-parallel example code, frontDemo, calculates the risk and return based on historical data from a collection of stock prices. The core of the code, calcFrontier, minimizes the equations for a set of returns. In order to parallelize the code, the for loop is converted into a parfor loop with each iteration of the loop becoming its own independent task. 

To submit the job, copy submitParJobToCluster.m into your working directory, make the necessary modifications for your job environment, and then run the code from within MATLAB. This will submit the job.  An explanation of the code follows:

function job = submitParJobToCluster()

if nargin==0, sz = 3; end

% Set the walltime to 5 minutes
ClusterInfo.setWallTime('00:05:00');                  % change this to the actual walltime that you need.
ClusterInfo.setEmailAddress('YourEmailAddressHere')   % include your email address here.

job = batch(@frontDemo,2,{},'Matlabpool',sz,'CaptureDiary', true);   % this submits the frontDemo.m job.
           % @frontDemo is the function to submit.
           % 2 is the number of output arguments
           % {} is an empty array of input arguments
           % sz is the number of processor cores (workers)
           % CaptureDiary is set to true
job.wait                                                             % the MATLAB GUI will pause here until the job finishes
try
   error(job.Task.ErrorMessage)
   out = job.getAllOutputArguments();                                % get all output arguments from the completed job
   r = out{1}; v = out{2};
   plot(r,v)                                                         % plot the results
   job.diary
catch ME
   error(ME.message)
end

if nargout==0
   job.destroy
   clear job
end

Code

When you run submitParJobToCluster within MATLAB, the frontDemo code will be submitted to the PBS job scheduler.  Use the showq command from a cluster terminal window to look for your job in the job queue.

The above is only an example used to illustrate how to submit a job and retrieve the results from within the same MATLAB session.  In most cases using job.wait will not be desirable because of the length of time it might take for a job to start running (perhaps several hours).  The best practice is to have your MATLAB code write the results to a file, rather than waiting for it to finish and retrieving the results with job.getAllOutputArguments();  The job.wait function and all of the lines below it should not be used unless you intend to keep the MATLAB session open while the job runs.

By default, your job will be allocated an entire compute node (all processor cores on the node) regardless of the sz value that you select.  This is to ensure that you will have access to all of the memory on the node.  If your job uses very little memory and you want to share the node with the jobs of other users, then include this line in submitParJobToCluster.m:

ClusterInfo.setUserDefinedOptions('-l nodes=X:ppn=Y')

where X is the number of nodes you want, and Y is the number of cores per node.

Optionally, if you use the above option but only want to share the node with other jobs that you own, then you should also specify this line in submitParJobToCluster.m:

ClusterInfo.setUserDefinedOptions('-W x=NACCESSPOLICY:SINGLEUSER')

This option will ensure that only jobs you own will share nodes with each other.   Jobs of other users will not be able to share the nodes that have been allocated to you.

Jobs may wait in queue even if there appear to be enough processors available to run the job.  This is likely happening because there are insufficient licenses available to run the job.  The PCT job submission methods documented here have the ability to check for MDCS licenses before the executes.  This prevents the job from failing due to a lack of licenses. 

Jobs on SUGAR do not have the ability to check for MDCS licenses in advance.  If you submit a job on SUGAR and it tries to checkout an MDCS license, but no licenses are available, then the job will fail.  If this happens please consider moving your jobs to DAVinCI.

The maximum and minimum number of workers per job submission is constrained by the queue policy on each cluster, with one worker per processor core.  For example, Sugar will not accept a compute job of more than 8 cores.  Therefore, your MATLAB job must be 8 workers or less.  Further, the number of workers must be n-1 where n is the number of cores you want to use.  If you want to run a single node (8 core) job, you will want to set sz to be 7.   MATLAB will run on one core and the workers will run on the other 7 cores, for a total of 8 cores.   Therefore you can run 7 workers on a single 8 core node.  Further, the maximum number of workers is constrained by our MATLAB MDCS license.  Consult our documentation for a listing of these constraints.

If you have one or more single core jobs to run then you would simply set the size parameter, sz, to be zero in the above example.  This simply allows the job to offload the work to a single MDCS worker and does not enable parallel capabilities.

For more information on the batch() command and all of its input arguments and how to use the diary, please see the MATLAB online help or the MathWorkshttp://www.mathworks.com/help/toolbox/distcomp/batch.html website.

The above example has the CaptureDiary parameter set to true.   This will allow the job to capture or display the Command Window output of the batch job.  See the diary() function for more information.

Be sure to destroy your job after the job has finished.

Submitting Data Parallel Jobs

The data-parallel example code calculates the area of pi under the curve. The non parallel version, calcPiSerial, calculates with a for loop, looping through discrete points. The parallel version, calcPiSpmd, uses the spmd construct to evaluate a part of the curve on each MATLAB instance. Each MATLAB instances uses its labindex (i.e. rank) to determine which portion of the curve to calculate. The calculations are then globally summed together and broadcasted back out. The code uses higher level routines, rather than lower level MPI calls. Once the summation has been calculated, it’s indexed into and communicated back to the local client MATLAB to calculate the total area.  

To submit the job, copy submitSpmdJobToCluster.m into your working directory, make the necessary modifications for your job environment, and then run the code from within MATLAB.  This will submit the job.   An explanation of the code follows:

function job = submitSpmdJobToCluster(sz)

if nargin==0, sz = 3; end

% Set the walltime to 5 minutes
ClusterInfo.setWallTime('00:05:00');                  % change this to the actual walltime that you need.
ClusterInfo.setEmailAddress('YourEmailAddressHere')   % include your email address here.

job = batch(@calcPiSpmd,1,{sz},'Matlabpool',sz);      % this will submit the calicPiSpmd function
           % @calcPiSpmd is the function to submit.
           % 1 is the number of output arguments
           % {sz} is an array of input arguments
           % sz is the number of processor cores (workers)

job.wait                                              % the MATLAB GUI will pause here until the job finishes
try
    error(job.Task.ErrorMessage)
    out = job.getAllOutputArguments();                % get all output arguments from the completed job
    p = out{1}
catch ME
    error(ME.message)
end

if nargout==0
   job.destroy
   clear job
end

 
When you run submitSpmdJobToCluster within MATLAB, the calcPiSpmd code will be submitted to the PBS job scheduler.  Use the showq command from a cluster terminal window to look for your job in the job queue.

Code

The above is only an example used to illustrate how to submit a job and retrieve the results from within the same MATLAB session.  In most cases using job.wait will not be desirable because of the length of time it might take for a job to start running (perhaps several hours).  The best practice is to have your MATLAB code write the results to a file, rather than waiting for it to finish and retrieving the results with job.getAllOutputArguments();

By default, your job will be allocated an entire compute node (all processor cores on the node) regardless of the sz value that you select.  This is to ensure that you will have access to all of the memory on the node.  If your job uses very little memory and you want to share the node with the jobs of other users, then include this line in submitSpmdJobToCluster.m:

ClusterInfo.setUserDefinedOptions('-l nodes=X:ppn=Y')

where X is the number of nodes you want, and Y is the number of cores per node.

Optionally, if you use the above option but only want to share the node with other jobs that you own, then you should also specify this line in submitSpmdJobToCluster.m:

ClusterInfo.setUserDefinedOptions('-W x=NACCESSPOLICY:SINGLEUSER')

This option will ensure that only jobs you own will share nodes with each other.   Jobs of other users will not be able to share the nodes that have been allocated to you.

Jobs may wait in queue even if there appear to be enough processors available to run the job.  This is likely happening because there are insufficient licenses available to run the job.  The PCT job submission methods documented here have the ability to check for MDCS licenses before the executes.  This prevents the job from failing due to a lack of licenses. 

Jobs on SUGAR do not have the ability to check for MDCS licenses in advance.  If you submit a job on SUGAR and it tries to checkout an MDCS license, but no licenses are available, then the job will fail.  If this happens please consider moving your jobs to DAVinCI.

The maximum and minimum number of workers per job submission is constrained by the queue policy on each cluster, with one worker per processor core.  For example, Sugar will not accept a compute job of more than 8 cores.  Therefore, your MATLAB job must be 8 workers or less.  Further, the number of workers must be n-1 where n is the number of cores you want to use.  If you want to run a single node (8 core) job, you will want to set sz to be 7.   MATLAB will run on one core and the workers will run on the other 7 cores, for a total of 8 cores.   Therefore you can run 7 workers on a single 8 core node.  Further, the maximum number of workers is constrained by our MATLAB MDCS license.  Consult our documentation for a listing of these constraints.

For more information on the batch() command and all of its input arguments and how to use the diary, please see MATLAB's online help or the MathWorks website.

Be sure to destroy your job after the job has finished.

Job Dependencies

In order to run code on the cluster, a job may be dependent on several MATLAB or data files. The batch() function takes two parameters: PathDependencies and FileDependencies. Both can be assigned to a comma separated cell array of filenames and/or folder names. If the MATLAB client shares a file system with the compute nodes, then typically the user will specify the dependencies on the local path (i.e PathDependencies). For example:

job = batch(@myrand,1,{2,3}, … ’PathDependencies’,{‘/home/mathworks/code’});

If the MATLAB client does not share a file system with the compute nodes, then the user will specify the dependencies on the path (i.e. FileDependencies}, which in turn will be zipped up as part of the job. For example:

job = batch(@myrand,1,{2,3}, … ’FileDependencies’,{‘/home/mathworks/code/random.m’});

In this example, the files myrand.m and random.m will be zipped up (the caller function, myrand, is assumed to be needed and is always included in the ZIP file).

Configuring Cluster Parameters with ClusterInfo

As previously mentioned, configurations are written to describe the scheme of the cluster. However, there are some properties that may need to be set often or at runtime. ClusterInfo provides a mechanism for the user to set additional properties. For example, the user may want to specify that a job should last no longer than 30 minutes.

ClusterInfo.setWallTime('00:30:00')

Or the email address to use to be notified when a job is running

ClusterInfo.setEmailAddress('YourEmailAddressHere')

There might be circumstances in which the user might wish to change the default queue name that the job will run in as follows:

ClusterInfo.setQueueName('QueueNameHere')

The entire collection of properties can be displayed with the state method

ClusterInfo.state

The values persist between jobs as well as between MATLAB sessions, until cleared. They can be cleared by setting a single property empty

ClusterInfo.setWallTime('')

Or by clearing the entire set

ClusterInfo.clear

It is very important to note that properties set with ClusterInfo will persist between MATLAB sessions.

Listing Your Jobs

When you submit a job with batch, you will notice that each submission is labeled Job1Job2, and so forth.  Temporary directories associated with each job can be found in ~/MdcsDataLocation as the jobs are running.  To view a listing of your jobs, their JobIDs, and status, use the following commands:

sched = findResource();        		# to build a list of the jobs that you have executed.
sched.findJob();               		# to display the jobs

Destroying a Job

When you submit a job with batch, you will notice that each submission is labeled Job1, Job2, and so forth.  Temporary directories associated with each job can be found in ~/MdcsDataLocation as the jobs are running.  When job.destroy is called, these temporary directories are deleted.  The above examples call job.destroy.  If you close your MATLAB session before executing job.destroy, which is likely unless you are using the full example with job.wait, you will need to perform extra steps to destroy the job later.  Follow these steps to destroy a job:

sched = findResource();        		# to build a list of the jobs that you have executed.
sched.findJob();               		# to display the jobs
job = sched.findJob('ID',JOB-ID);	# assign the variable 'job' to the job identified by 'JOB-ID'.
job.destroy;				# to destroy this job
 
sched = findResource();        		# to build a list of the jobs that you have executed.
sched.findJob();               		# to display the jobs.
jobs = sched.Jobs			# assign the variable 'job' to a list of all of your jobs.
jobs.destroy;				# destroy all jobs.

It is good practice to destroy jobs once you have finished with them. Failure to do so can clutter your MdcsDataLocation directory, count against your disk quota, and will cause findResource() to function slowly.

 

Running a Job on a GPGPU

MATLAB 2011a and higher versions are capable of running jobs on GPGPUs as well as on CPUs.  See our FAQ for details.

Running Locally on a Desktop

In order to run MATLAB code with a parallel component on your desktop locally, you must first start up a MATLAB Pool, as such:

matlabpool open local 8

where 8 is the number of MATLAB processes to attach to the job.  At this point you will have access to eight MATLAB workers for use with parallel code, such as code with parfor loops, and so on.

This should not be more than nc-1, where nc is the number of cores on the local machine.  The  maximum number of cores that you can use is 8 with version 2011a, and 12 with version 2011b.  Your ability to run parallel jobs on your desktop will be constrained by the number of PCT licenses that are available for checkout on the campus license server.

After running the code, close the MATLAB Pool:

matlabpool close

Calls to matlabpool should not be embedded in in the MATLAB code, but rather called at the MATLAB command prompt.

PCT Resources and Demos

Here are a few resources for getting started with the Parallel Computing Toolbox:

Here is the Tips and Tricks document provided by MathWorks and presented at the Parallel Computing Workshop at Rice University on September 5, 2012: