Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

EAM and SCREAM uses the HOMME dycore package.   HOMME has both Fortran and C++ versions with different options for the prognostic thermodynamic variable (preqx use temperature while theta-l uses potential temperature) and those in turn have some additional capabilities.

...

Summary of HOMME capabilities available for different versions.


SL-transport (C++ only)

phys grid

non-hydrostatic option

GPU option

Allows RRM grids

E3SM version used

HOMME preqx Fortran

no

yes

no

yes (OpenACC)

yes

v1

HOMME preqx C++

no

yes

no

yes (Kokkos)

no

only available in standalone HOMME

HOMME theta-l Fortran

yes (Fortran calling C++/Kokkos)

yes

yes

yes (SL only via Kokkos, in a branch)1

yes

v2

HOMME theta-l C++

yes (in a branch)

not yet

yes

yes (Kokkos)

not yet

v3, SCREAM

  1. When building Kokkos for theta-l Fortran, the "cpu" target is used by default because running only the SL on the GPU would not give good performance.

New features that did not make the V2 cutoff:

These features will hopefully be turned on for use in V3 development,  after the V2 maintenance branch is created.

  1. Physics mass adjustment due to phase change: apply locally to dp3d instead of to surface pressure. (reduces magnitude of artificial pressure work term)

  2. Adopt tensorHV for NE30 grids. (it is the default for all other grids)

  3. Turn on topographic improvements and switch to rougher topography

Using the THETA dycore in E3SM:     

...

dt_dyn:  The CFL condition for dt_dyn with uniform resolutions is well understood and depends on the horizontal resolution and the speed of the Lamb wave (~340m/s).   This assumes the use of the KG5 RK method (tstep_type=5) for hydrostatic, and the related IMEX method (tstep_type=9 or 10) for NH.  

  • NE30:   dt_dyn = 300    (probably stable up to 360)

  • NE60:    dt_dyn = 150

  • NE120:   dt_dyn = 75

  • NE240:    dt_dyn = 40

  • NE256:    dt_dyn <= 37.536

  • NE512:    dt_dyn <= 18.75

  • NE1024    dt_dyn <= 9.375

dt_tracer:   The tracer CFL condition is controlled by the maximum advective velocity (~200m/s) which usually occurs near the model top.  For Eulerian advection HOMME uses a RK3 SSP and the CFL also depends on the mesh spacing.  For SL tracers, the limit is governed by the size of the halo exchange.   

...

For the THETA dycore, we need to determine the stable dt_remap, dt_dyn, dt_vis and dt_vis_tom.    We then take dt_vis_q=dt_vis and with SL tracers, we will assume that dt_tracer = 6*dt_dyn. 

  1. Run 10 days with very small, inefficient timesteps.   Take dt_remap=dt_dyn and use small values for  dt_dyn, dt_vis and dt_vis_tom.  Create a new IC file (see INITHIST in EAM).   Use this new IC file for all future runs below

  2. dt_vis_tom:  Use the CFL condition (take S=2) printed in atm.log to determine the largest safe value of dt_vis_tom.   In practice the code appears to be unstable around S=3.3, and it is good not to run right at the stability limit.     

  3. dt_dyn:   Keeping all other timesteps below their known stable limits (which takes some care due to how they are all set) make a serious of runs increasing just dt_dyn until the run crashes.  May need to make relatively long runs (several months) to ensure stability.  

  4. dt_vis:  Keeping all other timesteps fixed  find the largest stable value.  For THETA, with the recommended viscosity coefficients, dt_vis = dt should be stable.  For some RRM grids, dt_vis might be need to be slightly smaller due to mesh distortion.  When the viscous CFL is violated (dt_vis too large), the run usually crashes within a couple of steps.  

  5. The procedure outlined above can find timesteps that are borderline unstable, but don’t blow up do to various dissipation mechanisms.  Hence it is a good idea to run 3 months, and look at the monthly mean OMEGA500 from the 3rd month. This field will be noisy, but there should not be any obvious grid artifacts.  Weak instabilities can be masked by the large transients in flow snapshots, so it best to look at time averages.  

  6. dt_remap:  Using 2*dt_dyn is a conservative choice that is well tested up to ne240.   Decreasing dt_remap results in more frequent vertical remaps resulting in increased vertical dissipation.   If dt_remap is too large, the code may crash with negative layer thickness errors.  The code has a "dp3d" limiter that can prevent some of these crashes.  If this limiter is triggered (warnings in e3sm.log file), that can mean either dt_remap is too large of one of the CFL conditions has been violated and the code is unstable.       

During this tuning process, it is useful to compare the smallest ‘dx’ from the atmosphere log file to the smallest ‘dx’ from the global uniform high resolution run.  Use the ‘dx’ based on the singular values of Dinv, not the ‘dx’ based on element area. If the ‘dx’ for newmesh.g is 20% smaller than the value from the global uniform grid, it suggests the advective timesteps might need to be 20% smaller.  The code prints out CFL estimates that are rough approximation that can be used to check if you are in the correct ballpark.

...

With RRM grids, the timesteps will be controlled by the highest resolution region.  So with an RRM grid with refinement down to NE120, the timesteps should be close to what we run on a uniform cubed-sphere NE120 grid.   The timesteps may need to be slightly smaller because of the deformed elements in the transition region.   With a hiqh quality RRM mesh ( Max Dinv-based element distortion metric <= 4, see Generate the Grid Mesh (Exodus) File for a new Regionally-Refined Grid) we can usually run with the expected dt_dyn and dt_tracer values, and only the viscosity timesteps need to be slightly reduced.  

Spreadsheet for looking at scaling with resolution of constant and tensor coefficient HV:

https://docs.google.com/spreadsheets/d/1LHTl2_A065pfdWC69OHmXvNL_v1484cXg7ZowhyEbPU/edit?usp=sharing

...

WARNING on dtime: in the table below we give "dtime", the physics timestep. This is also a namelist option in EAM, but it will be ignored.  The only way to set dtime is to set ATM_NCPL = ( 24*60*60  / dtime) in env_run.XML


Resolution

Timesteps

Namelist settings

Notes

Tested?

7.5 degree
(NE4)

dtime=7200

dt_tracer=3600
dt_remap=1200
dt_dyn=dt_vis=dt_vis_q=600
dt_vis_tom=600

nu_top = 2.5e5

se_tstep=600

Ultra low res for regression testing only.   


1 degree (NE30)

dtime=1800 

dt_tracer=1800
dt_remap=600
dt_dyn=dt_vis=dt_vis_q=300
dt_vis_tom=300



nu_top = 2.5e5    

se_tstep=300



If backward compatability is needed for V2 water cycle, change HV defaults to:

hypervis_scaling=0
nu=1e15


E3SMv2 NE30 does not use tensorHV to avoid having to retune. Should change in v3.

With dt_remap=1800, we see occasional (every 2-3 years) dp3d limiter activation, meaning that the model levels are approaching zero.  This appears to be due to strong divergence above tropical cyclones created by one of the parameterizations.

HS+topo(72L):  H and NH (H can run at t 360s but not 400s with either tstep_type=4 or 5).  dt_remap=600 runs with no limiter warnings, while dt_remap=900 crashes with dp3d<0 at surface.  

F-EAMv1-AQP1:  H and NH 

FAV1C-L:  H and NH


NE45

dtime=1200. 

dt_tracer=1200
dt_remap=400
dt_dyn=dt_vis=dt_vis_q=200

dt_vis_tom=200

nu_top=2.5e5

se_tstep=200



1/4 degree (NE120)

dtime=900  

dt_remap=150
dt_tracer=450  (could be as large as 650, but it has to divide 900)
dt_dyn=dt_vis=dt_vis_q=75
dt_vis_tom=75

nu_top=1e5

se_tstep=75


CFL estimates suggest:

dt_vis_tom*nu_top <= 31*2.5e5

nu_top=2.5e5 would need
hypervis_subcycle_tom=3


HS+topo(72L):  H and NH (with dt_remap=75 and theta limiter to handle unphysical boundary layer)

F-EAMv1-AQP1:  H  and NH, both 72 and 128 levels (1+ years)

FC5AV1C-H01B:  NH 72L runs several years.   

12km

(NE256)

dtime=600

dt_tracer=200
dt_remap=200/3.
dt_dyn=dt_vis=dt_vis_q=200/6.
dt_vis_tom = 200/6.


NOTE: these defaults were updated 2021/9 based on SCREAM v0.1 3km runs.  But NE256 is known to run stably at slightly larger timsteps:

dt_tracer=300
dt_remap=75
dt_dyn=dt_vis=dt_vis_q=37.5
dt_vis_tom = 37.5


nu_top=4e4
se_tstep=33.33333333333


nu_tom=4e4 is running at the code's estimate of the CFL limit with S=1.9


F-EAMv1-AQP1:

  • H-128L running with default timesteps. (1 month) 

  • NH-128L:

    • tstep_type=9:  Runs with dt=37.5/75/300/600.  crashes with dt=40/40/240/480 after 6d, dp3d<0 around layer 30.   

    • with tstep_type=10: dt=40/40/240/480  good (for 1 month).  

FC5AV1C-H01B:  NH 128L run for several months dt=37.5/75/300/600.  Occasional problems near coastlines - considering (  ) reducing dtime, increasing HV, tunning CLUBB

dtime=600 used in:

https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021MS002805

6km (NE512)

dtime=200

dt_tracer=100
dt_remap=100/3.
dt_dyn=dt_vis=dt_vis_q=100/6.
dt_vis_tom=100/6.



nu_top=2e4
se_tstep=16.6666666666666




CFL estimates suggest:

dt_vis_tom*nu_top <= 1.7*2.5e5,

nu_top=2.5e5 needs hypervis_subcycle_tom=13 

F-EAMv1-AQP1:

  • H-128L:  dt=20/40/120/240 good (ran 15d, but then died with "cloud cover" errors

  • NH-128L: 

    • tstep_type=9:  dt=20/40/120/240: crash bad EOS 2.4days.  dt=18.75/37.5/150/300, ran 20days.

    • tstep_type=10: dt=20/40/120/240 crash 3.1days.  bad dp3d layer 75

FC5AV1C-H01B:  NH 128L run for 1 day with dt=18.75/37.5/150/300, then NaNs in microphysics (not yet debugged)

3km (NE1024)

dtime=100

dt_tracer=50
dt_remap=16.6666
dt_dyn=dt_vis=dt_vis_q=8.3333
dt_vis_tom=8.3333

nu_top=1e4
se_tstep=8.3333333333333



CFL estimates suggest:

dt_vis_tom*nu_top <= 0.43*2.5e5

with nu_top=2.5e5, 
hypervis_subcycle_tom=24


F-EAMv1-AQP1:

  • H-128L:  dt=10/20/60/120.  crashed with cloud_cover errors < 1d

  • NH-128L:  dt=9.375/18.75/75/150 ran 1d

FC5AV1C-H01B: 

SCREAM v0:  run for 40 days with constantHV, dt=9.375/18.75/75/75. 

SCREAM v0.1: switch to tensorHV (slightly less diffusion), needs dt_dyn<=9s, dt_tracer<60

RRM

dtime=???

dycore timesteps should be set based on the finest region in the RRM.  

nu_top=Uncertain - needs more research.  Should probably switch to tensor laplacian.  For NE30→NE120 grids, start with NE120 constant coefficient value, 1e5.  


RRM uses a tensor HV formulation which scales with resolution dx^3.0  (For preqx, we used a dx^3.2 scaling. )

To determine the effective HV coefficient at a given resolution "dx", use:

nu_tensor = nu_const *( 2*rearth /((np-1)*dx))^{hv_scaling} * rearth^{-4.0}.

i.e. tensor nu=3.4e-8 when used at 1 degree resolution (dx=110,000m, np=4, rearth=6.376e6) is equivalent to  1e15 m^4/s.  







PREQX Default Settings (for reference)

...

se_tstep=-1             ( timesteps set via se_nsplit, rsplit,  qsplit)
dt_remap_factor=-1
dt_tracer_factor=-1



Resolution

Timesteps

Namelist settings

Notes

1 degree (NE30)

dtime=1800  (ATM_NCPL=48)

dt_remap=900
dt_tracer=300
dt_dyn=300
dt_vis=100
dt_vis_q=300


nu=1e15    

se_nsplit=2 
rsplit=3 
qsplit=1  
hypervis_subcycle=3
hypervis_subcycle_tom=0  (not supported in PREQX)
hypervis_subcycle_q=6



hypervis and TOM sponge layer done together.   

With timesplit sponge layer, can we get dt_dyn_vis=300?





1/4 degree (NE120)

dtime=900  (ATM_NCPL=96)

dt_remap=150
dt_tracer=75
dt_dyn=75
dt_vis=18.75
dt_vis_q=75


nu=1.5e13

se_nsplit=6 
rsplit=2
qsplit=1 
hypervis_subcycle=4
hypervis_subcycle_tom=0
hypervis_subcycle_q=1


CFL estimates suggest:

hypervis_subcycle=3 should work.

if we timesplit out the sponge layer, we would probably need:

hypervis_subcycle=2, hypervis_subcycle_tom=3

which is unlikely to be more efficient.  


1/8 degree (NE240)









Documentation on parameter tuning

...