MATLAB is, at its heart, a large Java virtual machine which you interact with through Mathwork's statement based language. Because MATLAB is first and foremost a desktop application, some changes in workflow are required in order to run it non-interactively on a batch scheduler of the kind our clusters use.
The Parallel Computing Toolbox (PCT) provides an API that allows you to submit a job that has multiple independent tasks (Task Parallel) or submit a job that has a single task that is a multiprocessor, and possibly multinode, task (Data Parallel).
Prerequisite: Set up a Cluster Profile
Depending on your file I/O needs and the presence of the Parallel Computing Toolbox in your desktop copy of MATLAB, you can choose to either
Running a Batch Task Using the Cluster Configurations
For this task, you will need a script or function with parallel structures in it, e.g. spmd or parfor loops. Visit Mathwork's site for more information on adding parallelization to your existing codes. For this example, we'll use a simple Pi estimation function provided by Mathworks as a demo.
Some notes on running in batch:
- The batch function will return as soon as the job is queued. If you want to wait until the job runs, then you need to use the wait(myJob) command as shown above. Because it may take hours to run the job on a busy system, this might not be the best idea.
- The arguments 'matlabpool', and 11 are a key-value pair assigning the number of workers for the job. One worker has to be reserved for the control process, so on a single node with 12 cores, you can only run a matlabpool with 11 workers. This example is taken from Davinci which has 12 processors per node. You should adjust for clusters with different numbers of cores.
- See the online help for fetchOutputs to get more detailed outputs from a script.
- If you have a script instead of a function, then you can directly load variables from your job by running a command like load(myJob, 'variablename').
- When your job is finished, and you've retrieved the output, you should delete the job as shown in the last line above. This will clean up your data directory of job output files.
How to Reconnect to Jobs From a Previous Session
Because it may take hours or days to run a large job, it is infeasible to leave a matlab shell logged in the whole time between the cluster login node and your workstation. The Parallel Computing Toolbox provides a mechanism to keep track of running and completed jobs in between sessions.
Mathwork's Introduction to Parallel Solutions: http://www.mathworks.com/help/distcomp/introduction-to-parallel-solutions.html
Help for creating cluster profiles: http://www.mathworks.com/help/mdce/install-product-and-choose-cluster-configuration.html