Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 72 Next »

Document under development!

This document is still under development and is being written/edited by RCSG and MathWorks staff.

Table of Contents

Introduction

There are several ways in which to submit MATLAB jobs to a cluster.  This document will cover the various ways to run MATLAB compute jobs on the Shared Research Compute clusters, which will include using the Parallel Computing Toolbox (PCT) and the MATLAB Distributed Compute Server (MDCS) to submit many independent tasks as well as to submit a single task that has parallel components.   Examples are included.

Definitions

Task Parallel Application - The same application that runs independently on several nodes, possibly with different input parameters.  There is no communication, shared data, or synchronization points between the nodes.

Data Parallel Application - The same application that runs on several labs simultaneously, with communication, shared data, or synchronization points between the labs.

Lab - A MATLAB worker in a multicore (parallel) job.  One lab is assigned to one core.  Thus, a job with eight labs has eight processor cores allocated to it.

MDCS - Matlab Distributed Compute Server.  This is a component of MATLAB that allows our clusters to run MATLAB jobs that exceed the size of a single compute node (multinode parallel jobs).  It also allows jobs to run even if there are not enough toolbox licenses available for a particular toolbox, so long as the university owns at least one license for the particular toolbox. 

MATLAB Task - One segment of a job to be evaluated by a worker.

MATLAB Job - The complete large-scale operation to perform in MATLAB, composed of a set of tasks.

MATLAB Worker - The MATLAB session that performs the task computations.  If a job needs eight processor cores, then it must have eight workers.

Job - Job submitted via the PBS job scheduler (also called PBS Job).

Interactive Jobs

TBA take from old FAQ

Running Jobs with PBS qsub

TBA Good for jobs that are single processor jobs and do not encounter any toolbox license issues.  Take from old FAQ.

Using PCT for Task Parallel and Data Parallel Jobs

PCT provides an API that allows you to submit multiple jobs with a single job submission (Task Parallel) or submit a single task that is a multiprocessor (and possibly multinode) job.  In order to run this type of job you must first configure MATLAB for this type of job submission by following these steps;

One-time Setup

These steps need to be performed only once.  Subsequent sessions of MATLAB do not need to be configured.

Configuring MATLAB

1.   In your home directory create the MdcsDataLocation subdirectory.

2.  Load the MATLAB 2011a environment:

3.  Run MATLAB on the login node:

4.  In MATLAB, add the /opt/apps/matlab/2011a-scripts folder to your MATLAB path so that MATLAB will be able to find the scripts necessary to submit and schedule jobs.

  1. Click File > Set Path
  2. Click Add Folder
  3. Specify the following folder:
    /opt/apps/matlab/2011a-scripts

    Error saving pathdef.m

    If MATLAB reports that it is unable to save pathdef.m in your current folder, then follow the prompts to select your home folder before saving the file.

5.  Import the cluster configuration for the cluster you are running on:

  1. On the MATLAB Desktop menu bar, click Parallel > Manage Configurations.
  2. Click File > Import
  3. In the Import Configuration dialog box, browse to find the MAT-file for the configuration that you want to import.  Navigate to /opt/apps/matlab/2011a-scripts and select the configuration for the system you are using, such as sugar.mat, davinci.mat, stic.mat, and so forth.  Select the file and click Import.
  4. Select the configuration you have imported and click Start Validation.
    1. All four stages should pass:  Find Resources, Distributed Job, Parallel Job, Matlabpool

      Validation will fail on a busy cluster

      If the cluster is busy such that a job submission must wait before it will run then the validation steps will fail.

If all validation stages succeed, then you are ready to submit jobs to MDCS.

Submitting Task Parallel Jobs

The following is an example of a Task Parallel job.  The task-parallel example code, frontDemo, calculates the risk and return based on historical data from a collection of stock prices. The core of the code, calcFrontier, minimizes the equations for a set of returns. In order to parallelize the code, the for loop is converted into a parfor loop with each iteration of the loop becoming its own independent task. 

To submit the job, copy submitParJobToCluster.m into your working directory, make the necessary modifications for your job environment, and then run the code from within MATLAB. This will submit the job.  An explanation of the code follows:

submitParJobToCluster.m

Code

When you run this code within MATLAB, the frontDemo code will be submitted to the PBS job scheduler.  Use the showq command from a cluster terminal window to look for your job in the job queue.

Job Submission Best Practice

The above is only an example used to illustrate how to submit a job and retrieve the results from within the same MATLAB session.  In most cases using job.wait will not be desirable because of the length of time it might take for a job to start running (perhaps several hours).  The best practice is to have your MATLAB code write the results to a file, rather than waiting for it to finish and retrieving the results with job.getAllOutputArguments();

Maximum Number of Workers Per Submission

The maximum number of workers per job submission is constrained by the queue policy on each cluster, with one worker per processor core.  For example, Sugar will not accept more than 8 workers per submission. 

Help with batch() command

For more information on the batch() command and all of its input arguments and how to use the diary, please see the MATLAB online help or the MathWorkshttp://www.mathworks.com/help/toolbox/distcomp/batch.html website.

Cleanup your job space after the job has finished

Be sure to destroy your job after the job has finished.

Submitting Data Parallel Jobs

The data-parallel example code calculates the area of pi under the curve. The non parallel version, calcPiSerial, calculates with a for loop, looping through discrete points. The parallel version, calcPiSpmd, uses the spmd construct to evaluate a part of the curve on each MATLAB instance. Each MATLAB instances uses its labindex (i.e. rank) to determine which portion of the curve to calculate. The calculations are then globally summed together and broadcasted back out. The code uses higher level routines, rather than lower level MPI calls. Once the summation has been calculated, it’s indexed into and communicated back to the local client MATLAB to calculate the total area.  

To submit the job, copy submitSpmdJobToCluster.m into your working directory, make the necessary modifications for your job environment, and then run the code from within MATLAB.  This will submit the job.   An explanation of the code follows:

submitSpmdJobToCluster.m

 
When you run this code within MATLAB, the calcPiSpmd code will be submitted to the PBS job scheduler.  Use the showq command from a cluster terminal window to look for your job in the job queue.

Code

Job Submission Best Practice

The above is only an example used to illustrate how to submit a job and retrieve the results from within the same MATLAB session.  In most cases using job.wait will not be desirable because of the length of time it might take for a job to start running (perhaps several hours).  The best practice is to have your MATLAB code write the results to a file, rather than waiting for it to finish and retrieving the results with job.getAllOutputArguments();

Maximum Number of Workers Per Submission

The maximum number of workers per job submission is constrained by the queue policy on each cluster, with one worker per processor core.  For example, Sugar will not accept more than 8 workers per submission. 

Help with batch() command

For more information on the batch() command and all of its input arguments and how to use the diary, please see MATLAB's online help or the MathWorks website.

Cleanup your job space after the job has finished

Be sure to destroy your job after the job has finished.

Job Dependencies

Configuring Cluster Parameters with ClusterInfo

Destroying a Job

When you submit a job with batch, you will notice that each submission is labeled Job1, Job2, and so forth.  Temporary directories associated with each job can be found in ~/MdcsDataLocation as the jobs are running.  When job.destroy is called, these temporary directories are deleted.  The above examples call job.destroy.  If you close your MATLAB session before executing job.destroy, which is likely unless you are using the full example with job.wait, you will need to manually cleanup temporary directories in ~/MdcsDataLocation.

Running Locally on a Desktop

In order to run MATLAB code with a parallel component on your desktop locally, you must first start up a MATLAB Pool, as such:

where 8 is the number of MATLAB processes to attach to the job.  At this point you will have access to eight MATLAB workers for use with parallel code, such as code with parfor loops, and so on.

Maximum number of processors assigned to a job

This should not be more than nc-1, where nc is the number of cores on the local machine

After running the code, close the MATLAB Pool:

Calls to matlabpool should not be embedded in in the MATLAB code, but rather called at the MATLAB command prompt.

  • No labels