Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Serial mode, as its name implies, uses one core at a time for climos, and proceeds sequentially from months to seasons to computes sequentially the monthly then seasonal then annual climatologies. Serial mode means that climos are performed serially, but regridding will employ OMP threading on platforms that support it, and use up to 16 cores. By design each month and each season is are independent of the others, so all months can be computed in parallel, then each season can be computed in parallel (using monthly climatologies), then the annual average can be computed. Background parallelization mode exploits this parallelism and executes the climos in parallel as background processes on a single node, so that twelve cores are simultaneously employed for monthly climatologies, four for seasonal, and one for annual. The optional regridding will employ up to two cores per process. MPI parallelism executes the climatologies on different nodes so that up to (optimally) twelve nodes are employed performing monthly climos. The full memory of each node is available for each individual climo. The optional regridding will employ up to eight cores per node. MPI mode or Background mode on a big memory queue must be used to process ne30L72 and ne120L30 climos on some, but not all, DOE computers. For example, attempting an ne120L30 climo on in background mode on rhea (i.e., on one 128 GB compute node) will fail due to OOM. (OOM errors do not produce useful return codes so if your climo processes die without printing useful information, the cause may be OOM). However the same climo will succeed if executed on a single big-memory (1 TB) node on rhea (use -lpartition=gpu, as shown below). Or MPI mode can be used for any climatology. The same ne120L30 climo will also finish blazingly fast in background mode on cooley (i.e., on one 384 GB compute node), so MPI mode is unnecessary on cooley. In general, the fatter the memory, the better the performance. 

This implementation of parallelism for climatology generation has relatively poor granularity. Meaning that nodes using background or parallel mode always compute 12 monthly climatologies simultaneously, and nodes using serial mode always compute only 1 climatology at a time. Some nodes, e.g., your personal workstation, are underpowered for 12 yet overpowered for 1, and so would benefit from improved granularity. The '-j job_nbr' option in splitter mode (and in ncremap) already allows the user to specify the exact granularity to match the node's resources. A goal of ours is to implement a job_nbr parallelization algorithm for climo generation. This will enable most personal workstations to do better than serial mode.

For a Single, Dedicated Node on LCFs:

...