How To Run EAMxx (SCREAMv1)

EAMxx is not yet officially supported. Use these pages at your own risk.

This page documents how to run SCREAMv1 on supported machines.

Step 1: Environment:

The SCREAMv1 build requires a handful of python libraries. It also requires a ton of other dependencies, but CIME handles these automatically. Here’s what you need to do on each machine:

  • NERSC (cori-knl, pm-cpu, pm-gpu): nothing to do - everything is available by default

  • Summit: just module load python

  • LLNL machines (quartz, syrah): Create a conda environment with the needed packages:

conda create -n scream_v1_build pyyaml pylint psutil

Once this is done (one time for a given machine) you can activate the environment:

conda activate scream_v1_build

Step 2: Define convenience variables

In order to provide commands below which should work for everyone on all supported machines, we create some user-specific variables below. Change these as needed:

export CODE_ROOT=~/gitwork/scream/ #or wherever you cloned the scream repo export COMPSET=F2010-SCREAMv1 #or whatever compset you want export RES=ne4_ne4 #or whatever resolution you want export PECOUNT=16x1 # Number of MPIs by number of threads. Should be divisible by node size export CASE_NAME=${COMPSET}.${RES}.${PECOUNT}.test1 #name for your simulation. export QUEUE=pdebug #whatever the name of your debug or batch queue is export WALLTIME=00:30 #HH:MM on Summit, HH:MM:SS on other machines export COMPILER=intel #which compiler to use (can be omitted on some machines)

Grid options (just showing the best, most supported versions here):

Resolution

Grid name (aka $RES)

Notes

Resolution

Grid name (aka $RES)

Notes

ne4

ne4_ne4

not sure about pg2?

ne30

ne30pg2_ne30pg2

 

ne120

ne120pg2_ne120pg2

 

ne256

ne256pg2_ne256pg2

 

ne512

ne512_r0125_oRRS18to6v3

we don’t have a pg2 grid yet. This one is sketchy

ne1024

ne1024pg2_ne1024pg2

 

Suggested PECOUNTs (not necessarily performant, just something to get started). Note that EAMv1 currently uses something like 0.04 to 0.07GB/element, so make sure you don’t add more elements/node than you have memory for.

 

ne4 (max = 96)

ne30 (max = 5,400)

ne120 (max = 86,400)

ne256 (max = 393,216)

ne512 (max = 1,572,864)

ne1024 (max = 6,291,456)

 

ne4 (max = 96)

ne30 (max = 5,400)

ne120 (max = 86,400)

ne256 (max = 393,216)

ne512 (max = 1,572,864)

ne1024 (max = 6,291,456)

cori-knl (68 cores/node; 96+16 GB/node)

16x1

675x1

4096x1

3072x1

MAX_MPITASKS_PER_NODE = 8

 

 

pm-gpu (64 cores/node; 4 GPUs/node; 256 GB/node)

 

16x1

 

 

 

 

pm-cpu (128 cores/node; 512 GB/node)

96x1

256x1

 

 

 

 

syrah (16 cores/node; 64 GB/node)

32x1

160x1

1600x1

 

 

 

quartz (36 cores/node; 128 GB/node)

72x1

180x1

1800x1

 

 

 

summit (8 cores/node?; 6 GPUs/node; 512+96 GB/node)

 

256x1

4096x1

 

 

 

Available compilers are listed in $CODE_ROOT/cime_config/machines/config_compilers.xml. Options for the various machines are listed below. CIME also puts simulations in a different scratch directory on each machine. Figuring out where output will go can be confusing, so defaults are listed below.

 

Location where CIME puts run:

available compilers

 

Location where CIME puts run:

available compilers

cori-knl

/global/cscratch1/sd/${USER}/e3sm_scratch/cori-knl/

intel, gnu

pm-gpu

/pscratch/sd/{first-letter-of-username/${USER}/e3sm_scratch/pm-gpu/

gnugpu, nvidiagpu, gnu, nvidia

pm-cpu

/pscratch/sd/{first-letter-of-username/${USER}/e3sm_scratch/pm-cpu/

gnu, nvidia

syrah

/p/lustre2/${USER}/e3sm_scratch/syrah/

intel

quartz

/p/lustre2/${USER}/e3sm_scratch/quartz/

intel

summit

/gpfs/alpine/cli115/proj-shared/${USER}/e3sm_scratch/ for run stuff

gnugpu, ibmgpu, pgigpu, gnu, ibm, pgi

Step 3. Create the Case

From the location you want to build+run the model, issue:

Then cd ${CASE_NAME}

Step 4: Change CIME settings (if Desired)

You shouldn’t need to change anything to run, but some things you may want to change are:

The point of these changes are (respectively):

  1. change the atm timestep to 288 steps per day (300 sec). This needs to be done via ATM_NCPL or else the land model will get confused about how frequently it is coupling with land

  2. compile in debug mode. Will run 10x slower but will provide better error messages. And doesn’t run in any other mode on some machines.

  3. change the default queue and wallclock from the standard queue with 1 hr walltime to debug queue and its max of 30 min walltime to get through the queue faster. Note that the format for summit wallclock limits is hh:mm instead of hh:mm:ss on other machines.

  4. change the default length of the run from just a few steps to 1 day (or whatever you choose). This change is made in both env_run.xml and env_test.xml because case.submit seems to grab for one or the other file according to confusing rules - easier to just change both.

  5. HIST_OPTION and HIST_N set the frequency of coupler snapshots (cpl.hi) files, which are useful for figuring out whether SCREAM is getting or giving bad data from/to the surface models

  6. NTASKS is the number of MPI tasks to use (which sets the number of nodes submit asks for). You can also set this via --pecount setting in create_newcase.

  7. Changing the PIO_NETCDF_FORMAT to 64bit_data is needed at very high resolutions to avoid exceeding max variables size limits.

  8. Low-resolutions uses 72 layers by default. If you want to use 128 vertical layers (or 72), you need to change the number of levels. Note that this line completely replaces SCREAM_CMAKE_OPTIONS, so you also need to tell it some other stuff. SCREAM_NP is the number of GLL points per side of an element, which is always set to 4. The atm initial condition file keys off the compset AND the vertical grid, so you don’t need to worry about changing that.

Step 5: Configure the case

You need to issue

to create namelist_scream.xml, where most SCREAM settings are set.

Step 6: Change SCREAM settings

This is done by modifying namelist_scream.xml either by hand or by using the atmchange function which now comes bundled when you create a case. Explore namelist_scream.xml for variables you might want to change (but you shouldn’t have to change anything to run).

  • if you want to run with non-hydrostatic dycore:

    • Change to tstep_type=9 (or run ./atmchange tstep_type=9 in case directory)

    • Change to theta_hydrostatic_mode=False (or run ./atmchange theta_hydrostatic_mode=False)

  • Some bugs are affected by chunk length for vectorization, which is handled by “pack” size in v1. Pack size can be tweaked by editing the cmake machine file for the current machine (components/scream/cmake/machine-files/$machine.cmake).

Step 7: Change output settings

Changing EAMxx output is still kind of a hack. From your case directory, edit run/data/scream_default_output.yaml . You can add any variable stored in the field manager or added as a special diagnostic. You can get a list of available variables by printing out the DAG for your run. To do so, issue ./atmchange atmosphere_dag_verbosity_level=4 (as noted in Step 6). Available verbosity levels are: <=0 produces no DAG, 1 only prints the atm processes DAG, 2 adds fields names to the dag nodes, 3 adds fields layout info, and 4 also adds grid name where the field is defined. The DAG file will be named scream_atm_dag.dot. Word does a nice job of viewing these files.

Step 8: Config/Compile/Run

Now setup, build, and run. You can setup and build before most or all of the above customization.

You can check it’s progress via squeue -u <username> on LLNL and NERSC systems. Use jobstat -u <username> or bjobs on Summit. Kill jobs with bkill <jobid> on Summit. Model output will be in the run subdirectory.

Rules for changing output mid-run

In CAM/EAMf90, changing your output at all mid-run requires doing a “branch run”. In EAMxx, users are welcome to add new output streams or stop existing streams any time they restart. This is enabled by EAMxx including the output frequency and averaging type in the name of all output files, so if you modify either of these things, you will create a new and unique output stream.

You may need to add Perform Restart: false to any new output stream you create mid-run (there’s still some debate about this).

** Bonus Content: How to run at an unsupported resolution! **

Make the following ./xmlchange commands:

  1. If you want to construct your own grid (e.g. for RRM), see https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/872579110

  2. ./xmlchange ATM_NCPL . Note that this is the number of atm timesteps per day so the model timestep (commonly known as dtime) is 86400 sec/ATM_NCPL. See the dycore settings link at the end of this section for guidance on choosing dtime.

In namelist_scream.xml, make the following changes:

  1. Change Vertical__Coordinate__Filename to use the initial condition file for your new resolution

  2. Change Filename under Initial__Conditions → Physics__GLL subsection to also use that new initial condition file

  3. Change SPA__Remap__File to use one appropriate to map ne30 to your new resolution

  4. Change se_ne as appropriate

  5. change se_tstep and nu_top (recommended defaults for these and dtime are given in the awesome table on the EAM's HOMME Dycore Recommended Settings (THETA) page