Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Running the ACME model requires a layout for which model cores are assigned to handle which model components called the PE layout. There are currently only a few people who know how to do this and there is no documentation of the process. This is a huge bottleneck which makes running on a new machine or coming up with an efficient layout for a new compset slow. The goal of this page is to provide the info needed for anyone on the project to create their own PE layouts (or at least know when their layout is bad).

...

  1. Choose a total number of tasks that is evenly divisible by the number of cores/node for your machine (e.g. asking for 14 total cores on a machine with 12 cores/node is dumb because you will be charged for 24 cores and 10 of them will sit idle).
  2. Atmosphere:
    1. Choose NTASKS_ATM so it evenly divides the number of spectral elements in your atmos grid.   For a cubed-sphere grid, the number of elements N = 6*NE^2.  The number of physics columns is 9N+2.  For RRM grids, the number of elements can be determined from the grid template file.  Having uneven numbers of elements per task, or using more tasks then their are elements is possible, and will speed up the physics, but not the dynamics, and is thus less efficient.
    2. For linux clusters and low numbers of nodes (less than 1000) it is typically best to use NTHREADS_ATM=1.   On Titan, Mira and KNL systems, threads should be used.  On Edison, small gains can be achieved by turning on hyperthreading and using 2 threads per MPI task, 24 MPI tasks per node.  
    3. When using threads, there are several additional considerations.   The number of MPI tasks times the number of threads per MPI task should be equal to the number of cores on the node.    The physics can make use of up to NTASKS_ATM*NTHREADS_ATM = # physics columns.  The dynamics by default can only make use of NTASKS_ATM*NTHREADS_ATM = N  (extra threads are ok, they will just not improve dynamics performance).   The new "nested openMP" feature can be used to allow the dynamics to use more threads but this compile time option is not yet enabled by default.    
    4. The table below shows the # elements and acceptable core counts for ACME atm resolutions:
atm res

# elements

# physics columns

acceptable core counts
ne30540048602

5400,2700,1800,1350,1080,900,675,600,540,450,350,300,270, ...

ne1208640077760286400, 43200, 28800, 21600,...

...