Exclusive process stride – EXCL_STRIDE

HPC compute nodes often have several GPUs per node. A common scenario is to set up several MPI tasks (on some CPU cores) driving the GPUs with other MPI tasks on the remaining CPU cores in main DRAM memory. This page describes how to set up this MPI task-to-core assignment.

Suppose a node has 42 CPU cores, 6 GPU devices, and the run needs 100 nodes. We use ATM as the component driving the GPUs in the examples below.

Process stride – PSTRID

To set up 6 MPI tasks per node for ATM and 42 MPI tasks per node for all other components:

  1. ./xmlchange MAX_TASKS_PER_NODE=42

  2. ./xmlchange MAX_MPITASKS_PER_NODE=42 # i.e. NTHRDS is 1

  3. ./xmlchange NTASKS=42*100 # all comps stacked on 100 nodes

  4. ./xmlchange PSTRID_ATM=7 # i.e. ATM procs stride 7 (=$NCPU/$NGPU = 42/6) with 6 ATM procs per node

  5. ./xmlchange NTASKS_ATM=6*100 # ATM is now at 6 procs per node

Process striding helps to

  • set 1 MPI process per GPU;

  • set MPI tasks close to GPUs when some CPU cores are closer to GPUs in a node’s NUMA context: may need to adjust ROOTPE_ATM (default 0) if core 0 is not the nearest to a GPU;

  • keep all CPU cores busy with other components.

Here, CPU cores used by ATM are shared with other non-ATM components. Sharing can lead to out-of-memory errors or reduce ATM’s performance.

Exclusive process stride – EXCL_STRIDE

In addition to steps 1--5 above,

  1. ./xmlchange EXCL_STRIDE_ATM=7 # i.e. only ATM runs on stride-7 procs, other components on the remaining procs

Exclusive striding mutually excludes ATM and non-ATM components from executing on cores assigned to ATM. This minimizes any core-sharing side-effects.

How to verify component-to-task placement

  1. ./xmlchange INFO_MPROF=2 # to log memory usage profiling from every MPI process

  2. zgrep "comps" run/e3sm.log*.gz # and look for "(pe= $task_id comps= cpl ICE GLC WAV IAC ESP)"

Logging produced from these steps notes which components are executing in an MPI task.

There is some, but minimal, error-checking and care should be exercised when setting up striding and exclusive striding.

Additional details are in the following PR: https://github.com/E3SM-Project/E3SM/pull/4859