PowerOmics's compute cores support 8-way SMT and using more then one thread per core can improve the performance of certain applications. We recommend users test their codes with different thread counts to find what configuration works best for them.
One thread per core:
If you use OpenMPI/1.8.8 and newer, the cores you request in your job script will be presented to your program, and by-core pinning will be in effect automatically.
Two or more threads per core:
Again, with OpenMPI/1.8.8 and newer, core pinning is automatically enabled. Just replace the 2 in the following script with 4 or 8 to increase the number of threads per core.
There are two OpenMP runtimes available on PowerOmics, one for the IBM XL compilers and one for GCC. They both use slightly different environment variables to control thread placement. The openmp_affinity.sh script with the -smt1, -smt2, -smt4, or -smt8 flags will generate an affinity map for 1, 2, 4, or 8 threads per core. Make sure to change the OMP_NUM_THREADS setting to match the -smt flag.
For codes with a combination of MPI and OpenMP, use the OMP_NUM_THREADS environment variable to control the number of threads per core. In this case OpenMPI will take care of pinning the OpenMP threads to the cores.