Rice University and IBM partnered to bring the first ever BlueGene Supercomputer to Texas in 2012. The Rice BlueGene/P was installed around 2012, and it was upgraded to BlueGene/Q around March, 2015. The new BlueGene/Q supercomputer is equipped with 16384 user-accessible cores on 1024 nodes, and in total 120 TB of GPFS shared storage. Each of these cores is 64-bit PowerPC A2 Processor running at 1.6 GHz, and supports 4-way simultaneous threading (i.e. 4-threading per core). Each core has a SIMD (Single Instruction, Multiple Data) Quad-vector double precision floating point unit (IBM-QPX). Each node has 16 cores supporting at most 64 (16x4) threads, and 16 GB DDR3 DRAM (i.e. 1GB/core). Like other BlueGene/Q supercomputers all over the world (for example, Sequoia at LLNL, Mira at Argonne Lab), the BlueGene/Q at Rice can perform large-scale High Performance Computing (HPC) and High Throughput Computing (HTC, that is, threading through OpenMP), and delivers a peak performance of 209 TFlops/sec. The ratio of I/O node to compute node is 1:32, so the BlueGene/Q requires the minimum block size (i.e. bg_size) to be 32 compute nodes (that is, 512 cores).
Hostname: bgq.rice.edu (aliased to bgq-fn.rcsg.rice.edu)
Cluster access: Not available for access to the entire Rice campus.
Login Authentication: For those who have been granted access, the login user name and password will be your NetID user name and password.
Prerequisite for Using This System
Unlike other CRC clusters, the Blue Gene/Q runs a customized kernel developed by IBM called Compute Node Kernel (CNK). The only login node into the system runs a 64-bit version of Red Hat Linux 6.5. As with other CRC clusters, you must have some basic knowledge of Linux, know how to: navigate the filesystem, create, edit, rename, and delete files, and to run basic commands and write small scripts. In addition, you should be aware that cross-compiling your code since the login node runs a different OS from the compute nodes. As always, if you need assistance you may contact the CRC by submitting a help request here.
|Filesystem||Accessed via Environment Variable||Physical Path||Size||Quota||Type||Purge Policy|
|Home directories||$HOME||/home||50 GB||NFS||none|
|Group project directories||$PROJECTS||/projects||none yet||NFS||none|
|Shared scratch high performance I/O||$SCRATCH||/bgqscratch||none||GPFS||14 days|
Research Data Compliance
Due to recent changes in NSF, NIH, DOD, and other government granting agencies, Research Data Management has become an important area of growth for Rice and is a critical factor in both conducting and funding research. The onus of maintaining and preserving research data generated by funded research is placed squarely upon the research faculty, post docs, and graduate students conducting the research. It is imperative that you are aware of your compliance responsibilities so as not to jeopardize the ability of Rice University to receive federal funding. We will help in any way possible to provide you the information and assistance you need, but the best place to start is the campus research data management website.
Scratch directory access
If a scratch directory is not made for you, make one with the command
Submit ALL jobs from your scratch directory, and job output should only be written to the scratch directory. Executables may remain in your home directory, but do not write to your home directory during a job.
Files in the scratch directory that are more than 2 weeks old will be removed automatically, and this policy is now practically implemented.
Compilers and Build Environment
There are two versions of MPI installed on the system that were provided by IBM. They are mpi/xl and mpi/gcc and are available with the module command. The mpi/xl version was compiled with the IBM XL compilers. The mpi/gcc version was compiled with GCC 4.4.7-4.
ESSL 5.1.1 and the rest of IBM's BGQ stack are installed at /bgsys/ibm_essl/prod/opt/ibmmath/essl/5.1/. Other packages and libraries, such as FFTW, can be found at /opt/apps.
The IBM compilers C/C++ version 12.1 and Fortran version 14.1 currently provided are available via environment modules.
Typical mpi/xl compiler commands are: mpixlf77_r, mpixlf90_r, mpixlf95_r, mpixlf2003_r, mpixlf2008_r, mpixlc_r, mpixlcxx_r, and corresponding non-thread-safe ones, e.g. mpixlf90.
Typical mpi/gcc compiler commands are: mpicc, mpicxx, mpic++, mpif77, mpif90.
IBM XL: Be sure to use a thread-safe compiler - its name ends with _r (underscore "r") and the -qsmp=omp flag.
GNU: use the -fopenmp flag
The optimization level depends on the specifics of the program code being compiled, typically, from -O0, to -O1, -O2, -O3, -O4 and -O5. The -O3 flag selects "SIMD=QPX" automatically
Environmental variables are set via the module command which allows use of installed applications under /opt/apps.
Frequently used module commands are:
|module avail||Show the list of all the available applications|
|module load||Load environment variables to use a specific application|
|module list||List all the environment variables loaded|
|module purge||Purge loaded environment variables|
Job scheduling is done via Loadleveler.
|Queue Name||Min. Nodes Per Job||Max. Nodes Per Job||Max. Walltime|
Here is a sample job submission script for MPI code for a "usp" user (please adjust the "usp" class/group to if you are a Rice user or wish to use the "devel" queue):
Notes on settings:
class: three options, devel for testing/debugging runs, allows only for 30 minutes, and is accessible for all users; usp for users from USP, rice for non-USP users, for example, users from Rice.
group: two options, usp, and rice.
bg_size: the requested number of nodes (not cores), minimum is 32. The number of cores is bg_size*16.
--ranks-per-node: the number of cores at each node that will be used. The maxium is 16, and in a particular job, this could be less than 16.
--np: this flat assigns the actual number of cores that will be used for this job.
--exe: this flag tells the "runjob" what executable will be used.
--args" this flag is to pass any arguments for the executable. For more than one args, one can use "--args $ARG1, --args $ARG2, --args $ARG3, ..."
For "MPI + OpenMP" hybrid mode codes, the command line to launch the job will be like this:
The "OMP_NUM_THREADS" parameter can be assigned as 1, 2, or 4, as each core can support up to 4 threads.
To submit the above jobsubmit.cmd job script, use this command:
To view the job in the queue, execute this command:
To cancel a job, use this command:
where jobID is the job identifier of the job you are trying to kill.