Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Running the ACME model requires a layout for which model cores are assigned to handle which model components called the PE layout. There are currently only a few people who know how to do this and there is no documentation of the process. This is a huge bottleneck which makes running on a new machine or coming up with an efficient layout for a new compset slow. The goal of this page is to provide the info needed for anyone on the project to create their own PE layouts (or at least know when their layout is bad).

...

  1. Choose a total number of tasks that is evenly divisible by the number of cores/node for your machine (e.g. asking for 14 total cores on a machine with 12 cores/node is dumb because you will be charged for 24 cores and 10 of them will sit idle).
  2. Atmosphere:
    1. Choose NTASKS_ATM so it evenly divides the number of spectral elements in your atmos grid.   For a cubed-sphere grid, the number of elements N = 6*NE^2.  The number of physics columns is 9N+2.  For RRM grids, the number of elements can be determined from the grid template file.  Having uneven numbers of elements per task, or using more tasks then their are elements is possible, and will speed up the physics, but not the dynamics, and is thus less efficient.
    2. For linux clusters and low numbers of nodes (less than 1000) it is typically best to use NTHREADS_ATM=1.   On Titan, Mira and KNL systems, threads should be used.  
    3. When using threads, there are several additional considerations.
     if
    1.   The number of MPI tasks times the number of threads per MPI task should be equal to the number of cores on the node.    The physics can make use of up to NTASKS_ATM*NTHREADS_ATM
    > the number of elements, nested openMP should be enabled.  This is a new feature that is not
    1. = # physics columns.  The dynamics by default can only make use of NTASKS_ATM*NTHREADS_ATM = N  (extra threads are ok, they will just not improve dynamics performance).   The new "nested openMP" feature can be used to allow the dynamics to use more threads but this compile time option is not yet enabled by default.    
     The
    1. The table below shows the # elements and acceptable core counts for ACME atm resolutions:
atm res

# elements

# physics columns

acceptable core counts
ne30540048602

5400,2700,1800,1350,1080,900,675,600,540,450,350,300,270, ...

ne12086400

...

77760286400, 43200, 28800, 21600,...


  1. The MPAS components
    work well at any core count but require mapping files of the form mpas-cice.graph.info.<ICE_CORE_COUNT>.nc and mpas-o.graph.info.<OCN_CORE_COUNT>.nc. These files are automatically downloaded from https://acme-svn2.ornl.gov/acme-repo/acme/inputdata/ocn/mpas-o/ and https://acme-svn2.ornl.gov/acme-repo/acme/inputdata/ice/mpas-cice/ by the ACME model, but may not exist yet if nobody has used these core counts yet. It is trivial to generate these files though. On edison, you can type

    module load metis/5.1.0
    gpmetis <graph_file>   <# pes>

    where graph_file is something like https://acme-svn2.ornl.gov/acme-repo/acme/inputdata/ocn/mpas-o/oRRS15to5/mpas-o.graph.info.151209 and # pes is the number of cores you want to use for that component.

  2. For single component (e.g CORE-forced ocean, AMIP atmos) compsets, you might as well use a serial configuration because only the active component will consume appreciable time.



...