Skip to end of metadata
Go to start of metadata

Introduction

Rice University and IBM partnered to bring the first ever BlueGene Supercomputer to Texas in 2012.  The Rice BlueGene/P was installed around 2012, and it was upgraded to BlueGene/Q around March, 2015. The new BlueGene/Q supercomputer is equipped with 16384 user-accessible cores on 1024 nodes, and in total 120 TB of GPFS shared storage. Each of these cores is 64-bit PowerPC A2 Processor running at 1.6 GHz, and supports 4-way simultaneous threading (i.e. 4-threading per core). Each core has a SIMD (Single Instruction, Multiple Data) Quad-vector double precision floating point unit (IBM-QPX). Each node has 16 cores supporting at most 64 (16x4) threads, and 16 GB DDR3 DRAM (i.e. 1GB/core). Like other BlueGene/Q supercomputers all over the world (for example, Sequoia at LLNL, Mira at Argonne Lab), the BlueGene/Q at Rice can perform large-scale High Performance Computing (HPC) and High Throughput Computing (HTC, that is, threading through OpenMP), and delivers a peak performance of 209 TFlops/sec. The ratio of I/O node to compute node is 1:32, so the BlueGene/Q requires the minimum block size (i.e. bg_size) to be 32 compute nodes (that is, 512 cores).

Essentials

Hostname:  bgq.rice.edu (aliased to bgq-fn.rcsg.rice.edu)

Login Authentication:  For those who have been granted access, the login user name and password will be your NetID user name and password. 

Prerequisite for Using This System

Unlike other CRC clusters, the Blue Gene/Q runs a customized kernel developed by IBM called Compute Node Kernel (CNK). The only login node into the system runs a 64-bit version of Red Hat Linux 6.5. As with other CRC clusters, you must have some basic knowledge of Linux, know how to: navigate the filesystem, create, edit, rename, and delete files, and to run basic commands and write small scripts. In addition, you should be aware that cross-compiling your code since the login node runs a different OS from the compute nodes. As always, if you need assistance you may contact the CRC by submitting a help request here.

Filesystems

FilesystemAccessed via Environment VariablePhysical PathSizeQuotaTypePurge Policy
Home directories$HOME/home 50 GBNFSnone
Group project directories$PROJECTS/projects none yetNFSnone
Shared scratch high performance I/O$SCRATCH/bgqscratch noneGPFS14 days

Data Backups Are the Responsibility of Each User

Backing up and archiving data remains the sole responsibility of the end user. At this point in time the shared computing enterprise does not offer these services in any automated way. We strongly encourage all users to take full advantage of Central IT's storage services, departmental servers, and/or individual group resources to prevent accidental loss or deletion of critical data. Please feel free to contact the CRC for advice on best practices in data management (e.g. IT's Subversion (SVN) services provide the safest environment for maintaining and developing source code). We welcome any suggestions for offering a higher level of data security as we move forward with shared computing at Rice.


Research Data Compliance

Due to recent changes in NSF, NIH, DOD, and other government granting agencies, Research Data Management has become an important area of growth for Rice and is a critical factor in both conducting and funding research. The onus of maintaining and preserving research data generated by funded research is placed squarely upon the research faculty, post docs, and graduate students conducting the research. It is imperative that you are aware of your compliance responsibilities so as not to jeopardize the ability of Rice University to receive federal funding. We will help in any way possible to provide you the information and assistance you need, but the best place to start is the campus research data management website.

$SHARED_SCRATCH is not permanent storage

$SHARED_SCRATCH is to be used only for job I/O.  Delete everything you do not need for another run at the end of the job or move to $WORK for analysis. Staff may periodically delete files from the $SHARED_SCRATCH file system even if files are less than 14 days old. A full file system inhibits use of the system for everyone. Using programs or scripts to actively circumvent the file purge policy will not be tolerated.

Scratch directory access

If a scratch directory is not made for you, make one with the command

Submit ALL jobs from your scratch directory, and job output should only be written to the scratch directory. Executables may remain in your home directory, but do not write to your home directory during a job.

Purge Policies

Files in the scratch directory that are more than 2 weeks old will be removed automatically, and this policy is now practically implemented.

Compilers and Build Environment

There are two versions of MPI installed on the system that were provided by IBM.  They are mpi/xl and mpi/gcc and are available with the module command.   The mpi/xl version was compiled with the IBM XL compilers.  The mpi/gcc version was compiled with GCC 4.4.7-4.

ESSL 5.1.1 and the rest of IBM's BGQ stack are installed at /bgsys/ibm_essl/prod/opt/ibmmath/essl/5.1/. Other packages and libraries, such as FFTW, can be found at /opt/apps.

Login Nodes are different than compute nodes

Be aware that the login nodes are running Red Hat Linux while the compute nodes run an entirely different software stack. The gcc and IBM compilers used to cross-compile and run code on the compute nodes must be explicitly added via the environment modules. The default compiler gcc and available xlc/xlf compilers are used to compile code that is binary compatible with the login nodes only. Please see IBMs documentation for more details.

The IBM compilers C/C++ version 12.1 and Fortran version 14.1 currently provided are available via environment modules.

Typical mpi/xl compiler commands are: mpixlf77_r, mpixlf90_r, mpixlf95_r, mpixlf2003_r, mpixlf2008_r, mpixlc_r, mpixlcxx_r, and corresponding non-thread-safe ones, e.g. mpixlf90.

Typical mpi/gcc compiler commands are: mpicc, mpicxx, mpic++, mpif77, mpif90.

OpenMP

IBM XL: Be sure to use a thread-safe compiler - its name ends with _r  (underscore "r") and the -qsmp=omp flag.

IBM OpenMP

GNU: use the -fopenmp flag

GCC OpenMP

The optimization level depends on the specifics of the program code being compiled, typically, from -O0, to -O1, -O2, -O3, -O4 and -O5. The -O3 flag selects "SIMD=QPX" automatically

Module Command

Environmental variables are set via the module command which allows use of installed applications under /opt/apps.

Frequently used module commands are:

command examplesmeaning
module availShow the list of all the available applications
module loadLoad environment variables to use a specific application
module listList all the environment variables loaded
module purgePurge loaded environment variables

For more information use the command: man module

Job Scheduling

Job scheduling is done via Loadleveler.

Queues

Queue NameMin. Nodes Per JobMax. Nodes Per JobMax. Walltime
usp512Undef24:00:00
rice512Undef24:00:00
develUndefUndef00:30:00

Cores are allocated in blocks of 512 (bg_size=32). If you request less than this, you will still be allocated 512, meaning that the additional cores are going to be sitting idle. Whenever your number of cores is not an integer multiple of 512, there will be idle cores. Please ensure that you do not waste cycles unintentionally.

Here is a sample job submission script for MPI code for a "usp" user (please adjust the "usp" class/group to if you are a Rice user or wish to use the "devel" queue):

jobscript.cmd

Notes on settings:

class: three options, devel for testing/debugging runs, allows only for 30 minutes, and is accessible for all users; usp for users from USP, rice for non-USP users, for example, users from Rice.

group: two options, usp, and rice.

bg_size: the requested number of nodes (not cores), minimum is 32. The number of cores is bg_size*16.

--ranks-per-node: the number of cores at each node that will be used. The maxium is 16, and in a particular job, this could be less than 16.

--np: this flat assigns the actual number of cores that will be used for this job.

--exe: this flag tells the "runjob" what executable will be used.

--args" this flag is to pass any arguments for the executable. For more than one args, one can use "--args  $ARG1,  --args  $ARG2,  --args  $ARG3, ..."

For  "MPI + OpenMP" hybrid mode codes, the command line to launch the job will be like this:

The "OMP_NUM_THREADS" parameter can be assigned as 1, 2, or 4, as each core can support up to 4 threads.

For more options, type" runjob --help", or "man runjob"

To submit the above jobsubmit.cmd job script, use this command:

To view the job in the queue, execute this command:

To cancel a job, use this command:

where jobID is the job identifier of the job you are trying to kill.

Use the online manuals for more information and examples for each of the above commands.  For example, use man llsubmit to see job submission help and examples.