EAM's HOMME dycore

EAM and SCREAM/EAMxx uses the HOMME dycore package.   HOMME has both Fortran and C++ versions with different options for the prognostic thermodynamic variable (preqx use temperature while theta-l uses potential temperature) and those in turn have some additional capabilities.

Summary of HOMME capabilities available for different versions.

 

SL-transport

phys grid

non-hydrostatic option

GPU option

Allows RRM grids

E3SM version used

HOMME preqx Fortran

no

yes

no

no

yes

v1

HOMME preqx C++

no

yes

no

yes (Kokkos)

no

only available in standalone HOMME

HOMME theta-l Fortran

yes (Fortran calling C++/Kokkos)

yes

yes

yes1

yes

v2,v3

HOMME theta-l C++

yes

yes

yes

yes (Kokkos)

not yet

v4, SCREAM

  1. When building Kokkos for theta-l Fortran, the "cpu" target is used by default because running only the SL on the GPU would not give good overall performance.

New features in EAMxx to be considered for EAM V3:

  1. Physics mass adjustment due to phase change: apply locally to dp3d instead of to surface pressure. (reduces magnitude of artificial pressure work term

  2. Turn on topographic improvements (pgrad_correction=1, hv_ref_profiles=6) and switch to rougher “x6t” topography

Using the THETA dycore in E3SM:     

As of 2020/12 in E3SM master (efb184d525), Theta dycore running in hydrostatic mode with SL transport is the default. 

To switch to the older Eulerian transport add to user_nl_eam

  • transport_alg=0, hypervis_subcycle_q=1, dt_tracer_factor=1

To switch to NH mode:

  • theta_hydrostatic_mode=.false., tstep_type=9.

In NH mode, tstep_type=9 is our current best recommendation.  Other options:  

  • tstep_type=5 is the 3rd order accurate KGU53 explicit method used for hydrostatic simulations.  It can be used with NH simulations with very small timestepes, ~0.5s

  • tstep_type=9 is an IMEX method, combining an explicit Butcher table (KGU53, used for most equations) and an implicit Butcher table (used for terms responsible for vertically propagating acoustic waves not present in hydrostatic models). 

  • tstep_type=10 is another IMEX method which uses KG52 for the explicit table.  It has a slightly larger stability region.  

Older preqx dycore is still available, but wont support newer namelist options:

  • ./xmlchange CAM_TARGET=preqx

 

Detailed description of timesteps and viscosity parameters   



The PREQX and tHETA dycores share much of the HOMME infrastructure and have several timesteps:  

  • dtime:           The physics timestep.  set in env_run.XML via ATM_NCPL.   Note: dtime can be entered in user_nl_cam, but it will be silently ignored.

  • dt_remap:     The Lagrangian vertical level remap timestep 

  • dt_tracer:      The tracer advection timestep

  • dt_dyn           Dynamics (u,v,temperature,mass) timestep

  • dt_vis             Hyperviscosity timestep for dynamics

  • dt_vis_tom     top-of-model Laplacian diffusion (sponge layer)

  • dt_vis_q         hyperviscosity timestep for tracers



These timesteps are controlled by various splitting parameters.  In EAM, "dtime" is set outside the dycore and used to define all other timesteps.  The splitting parameters enforce the fact that the various timesteps must evenly divide other timesteps.

See also Timestepping within the ACME v1 Atmosphere Model

THETA dycore:  (CAM_TARGET=theta-l)

EAM v2 is using the "theta" dycore, which supports both hydrostatic and nonhydrostatic dynamics and semi-Lagrangian tracer transport. The theta dycore also supports the ability to have dt_tracer > dt_remap, and introduces a separate timestep for the TOM sponge layer.  

New namelist variables:  EAMv2 now allows the user to directly set the main dycore timesteps with new namelist options replacing "nsplit, rsplit and qsplit".   When using the new namelist options, the old variables are set to:   nsplit=-1, rsplit=-1 and qsplit=-1.   The older namelist variables should be avoided, and they will removed when E3SM drops support for V1 compsets and the preqx dycore.   For information on the older namelist variables, see the PREQX section below.  

namelist inputs:    tstep, dt_remap_factor, dt_tracer_factor, hypervis_subcycle, hypervis_subcycle_q, hypervis_subcycle_tom

Resulting timesteps:  

  • dt_remap = tstep*dt_remap_factor             ( must evenly divide EAM's dtime, checked at runtime)

  • dt_tracer  = tstep*dt_tracer_factor              ( must evenly divide EAM's dtime, checked at runtime)

  • dt_dyn      = tstep

  • dt_vis        = dt_dyn/hypervis_subcycle

  • dt_vis_q     = dt_tracer/hypervis_subcycle_q

  • dt_vis_tom  = dt_dyn / max(hypervis_subcycle, hypervis_subcycle_tom)  

An additional constraint is that either dt_remap must evenly divide dt_tracer or dt_tracer must evenly divide dt_remap.

CFL Conditions

dt_dyn:  The CFL condition for dt_dyn with uniform resolutions is well understood and depends on the horizontal resolution and the speed of the Lamb wave (~340m/s).   This assumes the use of the KG5 RK method (tstep_type=5) for hydrostatic, and the related IMEX method (tstep_type=9 or 10) for NH.  

  • NE30:   dt_dyn = 300   

  • NE60:    dt_dyn = 150

  • NE120:   dt_dyn = 75

  • NE240:    dt_dyn = 40

  • NE256:    dt_dyn <= 36

  • NE512:    dt_dyn <= 18

  • NE1024    dt_dyn <= 9

The code prints an estimate of the viscous CFL condition in the log file, in the form "dt < S * max_eigen_value", where S depends on the timestepping method and for dt_dyn, assumes a gravity wave speed of 342 m/s. For KGU53 S=3.8. In practice, dt is often ~30% lower, so taking S=2.65 is a good target for new grids.

dt_tracer:   The tracer CFL condition is controlled by the maximum advective velocity (~200m/s) which usually occurs near the model top.  For Eulerian advection HOMME uses a RK3 SSP and the CFL also depends on the mesh spacing.  For SL tracers, the limit is governed by the size of the halo exchange.  

  • Eulerian tracers, dt_tracer = dt_dyn.     (assuming the default RK3 SSP method)

  • SL tracers,          dt_tracer = 6*dt_dyn.  

dt_remap:  This timestep is usually limited by physical constraints in order to resolve large divergence/convergence created by physical parameterizations.  It is also the only source of vertical dissipation in the dycore, so at extreme resolutions or perhaps in the NH dycore it may have to be decreased to increase the amount of vertical dissipation. 

  • dt_remap =  2*dt_dyn    

dt_vis, dt_vis_q, dt_vis_tom:   These timesteps have a CFL condition that depends on the horizontal resolution and the viscosity coefficient.  In the case of the TOM diffusion, the viscosity coefficient depends on the number of levels, bringing in an additional dependency. 

  • With the recommended hyper viscosity settings, one should take: 

    • dt_vis = dt_vis_q = dt_dyn      

    • The code prints an estimate of the viscous CFL condition in the log file, in the form "dt < S * max_eigen_value", where S depends on the timestepping method.   Hyperviscosity is applied with forward Euler, suggesting S=2.  However, numerical experiments on cubed sphere grids suggest the HV operator goes unstable at S=.80 and so we recommend taking S<.70 when estimating the maximum stable timestep.   For RRM grids S might need to be smaller.   

  • The TOM sponge layer uses a laplacian operator with coefficient, nu_top.  We used to keep this coefficient fixed (nu_top=2.5e5) for all resolutions.  But above 1/4 degree, this creates a severe  CFL condition.   For v2 and beyond, we now recommend running 2.5e5 for resolutions up to 1 degree (NE30) and for higher resolution run with a CFL number of 1.9.   This number is based on the CFL condition printed by the code.  Experiments suggest the operator is stable up to S=2.89, but unstable at S=3.38.  

  • In V1, we had nu_div >> nu, requiring a much smaller dt_vis that must be tuned specially.  This is not needed in V2.



Recommended timestep tuning for new grids

Do not get mislead by spinup issues!   Determining stable timestep values is complicated by spinup.  If the initial condition file is very much out of balance (say it came from interpolation from observations using different topography) then you may require very small timesteps and larger hyperviscosity coefficients in order to get past an initial adjustment period.  Only perform the tuning for the final timesteps with a spun up initial condition file.

Old procedures from https://docs.google.com/document/d/1ymlTgKz2SIvveRS72roKvNHN6a79B4TLOGrypPjRvg0/edit?usp=sharing    

For the THETA dycore, we need to determine the stable dt_remap, dt_dyn, dt_vis and dt_vis_tom.    We then take dt_vis_q=dt_vis and with SL tracers, we will assume that dt_tracer = 6*dt_dyn. 

  1. Run 10 days with very small, inefficient timesteps.   Take dt_remap=dt_dyn and use small values for  dt_dyn, dt_vis and dt_vis_tom.  Create a new IC file (see INITHIST in EAM).   Use this new IC file for all future runs below

  2. dt_vis_tom:  Use the CFL condition (take S=2) printed in atm.log to determine the largest safe value of dt_vis_tom.   In practice the code appears to be unstable around S=3.3, and it is good not to run right at the stability limit.     

  3. dt_dyn:   Keeping all other timesteps below their known stable limits (which takes some care due to how they are all set) make a serious of runs increasing just dt_dyn until the run crashes.  May need to make relatively long runs (several months) to ensure stability.  

  4. dt_vis:  Keeping all other timesteps fixed  find the largest stable value.  For THETA, with the recommended viscosity coefficients, dt_vis = dt should be stable.  For some RRM grids, dt_vis might be need to be slightly smaller due to mesh distortion.  When the viscous CFL is violated (dt_vis too large), the run usually crashes within a couple of steps.  

  5. The procedure outlined above can find timesteps that are borderline unstable, but don’t blow up do to various dissipation mechanisms.  Hence it is a good idea to run 3 months, and look at the monthly mean OMEGA500 from the 3rd month. This field will be noisy, but there should not be any obvious grid artifacts.  Weak instabilities can be masked by the large transients in flow snapshots, so it best to look at time averages.  

  6. dt_remap:  Using 2*dt_dyn is a conservative choice that is well tested up to ne240.   Decreasing dt_remap results in more frequent vertical remaps resulting in increased vertical dissipation.   If dt_remap is too large, the code may crash with negative layer thickness errors.  The code has a "dp3d" limiter that can prevent some of these crashes.  If this limiter is triggered (warnings in e3sm.log file), that can mean either dt_remap is too large of one of the CFL conditions has been violated and the code is unstable.       

During this tuning process, it is useful to compare the smallest ‘dx’ from the atmosphere log file to the smallest ‘dx’ from the global uniform high resolution run.  Use the ‘dx’ based on the singular values of Dinv, not the ‘dx’ based on element area. If the ‘dx’ for newmesh.g is 20% smaller than the value from the global uniform grid, it suggests the advective timesteps might need to be 20% smaller.  The code prints out CFL estimates that are rough approximation that can be used to check if you are in the correct ballpark.

RRM Grids

With RRM grids, the timesteps will be controlled by the highest resolution region.  So with an RRM grid with refinement down to NE120, the timesteps should be close to what we run on a uniform cubed-sphere NE120 grid.   The timesteps may need to be slightly smaller because of the deformed elements in the transition region.   With a hiqh quality RRM mesh ( Max Dinv-based element distortion metric <= 4, see Generate the Grid Mesh (Exodus) File for a new Regionally-Refined Grid) we can usually run with the expected dt_dyn and dt_tracer values, and only the viscosity timesteps need to be slightly reduced.  

 

Tensor vs constant coefficient hyperviscosity

HOMME’s tensor hyperviscosity operator uses resolution aware tensor coefficients to scale automatically with resolution and to adapt to deformed elements ( https://gmd.copernicus.org/articles/7/2803/2014/ ) It is required for RRM grids, and was adopted for all grids with E3SM v3. The strength is controlled by namelist variable “nu”, given in units of 1/s. The viscosity is scaled with resolution via hypervis_scalings=3, meaning the coefficient is reduced as dx^3. (for RRM simulations using the preqx dycore, hypervis_scalings=3.2 was used)

The CAM-SE and E3SM v2 hyperviscosity used for cubed-sphere grids was specified by “nu” in physical units, m^4/s. For a uniform grid with resolution “dx”, the tensor hyperviscosity will be identical to the constant coefficient hyperviscosity, with coefficients related by:

nu_tensor = nu_const *( 2*rearth /((np-1)*dx))^{hypervis_scaling} * rearth^{-4.0}.

i.e. tensor nu=3.4e-8 when used at 1 degree resolution (dx=110,000m, np=4, rearth=6.376e6, hypervis_scaling=3) is equivalent to  1e15 m^4/s.  Spreadsheet for looking at scaling with resolution of constant and tensor coefficient HV: https://docs.google.com/spreadsheets/d/1LHTl2_A065pfdWC69OHmXvNL_v1484cXg7ZowhyEbPU/edit?usp=sharing

For regular viscosity:

nu_tom_tensor = nu_tom_const *( 2*rearth /((np-1)*dx))^{laplace_scaling} * rearth^{-2.0}.

i.e. tensor nu=3.2e-7 when used at 3.25km (dx=3250m, np=4, rearth=6.376e6, laplace_scaling=1) is equivalent to 1e4 m^2/s. Tensor coefficient for regular viscosity is not currently in the model but may be added for use in the sponge layer.

Topo smoothing: The topo smoothing code uses hyperviscosity tensor (with hypervis_scalings=2) but with the regular laplace operator, not hyperviscosity. The coefficient is thus incorrectly scaled by rearth^4 instead of rearth^2. To convert that coefficient (4e-16) to a physical diffusion coefficient one must use rearth^{-4} in the above formula.





Recommended settings (THETA)   

All resolutions:  (default for theta-l dycore in master as of https://github.com/E3SM-Project/E3SM/pull/3368 )

se_ftype=2
se_limiter_option=9 
cubed_sphere_map=2 (change to 0 in user_nl_cam when using old compsets with ESMF maps)

transport_alg=12
semi_lagrange_cdr_alg=3 (2020/4: default changed to 3 reproduce some aspects of the v1 climate in v2)
semi_lagrange_nearest_point_lev = 256
semi_lagrange_hv_q = 1

theta_advect_form =1
vert_remap_q_alg=10  

nu_div=nu_p=nu_q=nu_s=-1      (use nu for all values)

nu, nu_top:  Resolution dependent

se_nsplit=-1             ( timesteps set via se_tstep, dt_remap_factor, dt_tracer_factor)
qsplit=-1
rsplit=-1

se_tstep (Resolution dependent)
dt_tracer_factor  = 6
dt_remap_factor = 2


hypervis_subcycle_q = dt_tracer_factor
hypervis_subcycle=1
hypervis_subcycle_tom=1

hypervis_scaling=3.0       ( default for grids as of 2023/4 )
nu=3.4e-8



WARNING on dtime: in the table below we give "dtime", the physics timestep. This is also a namelist option in EAM, but it will be ignored.  The only way to set dtime is to set ATM_NCPL = ( 24*60*60  / dtime) in env_run.XML



Resolution

Timesteps

Namelist settings

Notes

Tested?

Resolution

Timesteps

Namelist settings

Notes

Tested?

7.5 degree
(NE4)

dtime=7200

dt_tracer=3600
dt_remap=1200
dt_dyn=dt_vis=dt_vis_q=600
dt_vis_tom=600

nu_top = 2.5e5

se_tstep=600

Ultra low res for regression testing only.   



1 degree (NE30)

dtime=1800 

dt_tracer=1800
dt_remap=600
dt_dyn=dt_vis=dt_vis_q=300
dt_vis_tom=300






nu_top = 2.5e5    

se_tstep=300




E3SM v2 uses constant coefficient HV, enabled with:
hypervis_scaling=0
nu=1e15



With larger dt_remap( dt_remap=1800), we see occasional (every 2-3 years) dp3d limiter activation, meaning that the model levels are approaching zero.  This appears to be due to strong divergence above tropical cyclones created by one of the parameterizations.

HS+topo(72L):  H and NH (H can run at t 360s but not 400s with either tstep_type=4 or 5).  dt_remap=600 runs with no limiter warnings, while dt_remap=900 crashes with dp3d<0 at surface.  

F-EAMv1-AQP1:  H and NH 

FAV1C-L:  H and NH



NE45

dtime=1200. 

dt_tracer=1200
dt_remap=400
dt_dyn=dt_vis=dt_vis_q=200

dt_vis_tom=200

nu_top=2.5e5

se_tstep=200





1/4 degree (NE120)

dtime=900  

dt_remap=150
dt_tracer=450  (could be as large as 650, but it has to divide 900)
dt_dyn=dt_vis=dt_vis_q=75
dt_vis_tom=75

nu_top=1e5

se_tstep=75



CFL estimates suggest:

dt_vis_tom*nu_top <= 31*2.5e5

nu_top=2.5e5 would need
hypervis_subcycle_tom=3



HS+topo(72L):  H and NH (with dt_remap=75 and theta limiter to handle unphysical boundary layer)

F-EAMv1-AQP1:  H  and NH, both 72 and 128 levels (1+ years)

FC5AV1C-H01B:  NH 72L runs several years.   

12km

(NE256)

dtime=600

dt_tracer=200
dt_remap=200/3.
dt_dyn=dt_vis=dt_vis_q=200/6.
dt_vis_tom = 200/6.



NOTE: these defaults were updated 2021/9 based on SCREAM v0.1 3km runs.  But NE256 is known to run stably at slightly larger timsteps:

dt_tracer=300
dt_remap=75
dt_dyn=dt_vis=dt_vis_q=37.5
dt_vis_tom = 37.5



nu_top=4e4
se_tstep=33.33333333333



nu_tom=4e4 is running at the code's estimate of the CFL limit with S=1.9



F-EAMv1-AQP1:

  • H-128L running with default timesteps. (1 month) 

  • NH-128L:

    • tstep_type=9:  Runs with dt=37.5/75/300/600.  crashes with dt=40/40/240/480 after 6d, dp3d<0 around layer 30.   

    • with tstep_type=10: dt=40/40/240/480  good (for 1 month).  

FC5AV1C-H01B:  NH 128L run for several months dt=37.5/75/300/600.  Occasional problems near coastlines - considering ( Mar 26, 2020 ) reducing dtime, increasing HV, tunning CLUBB

dtime=600 used in:

https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021MS002805

6km (NE512)

dtime=200

dt_tracer=100
dt_remap=100/3.
dt_dyn=dt_vis=dt_vis_q=100/6.
dt_vis_tom=100/6.





nu_top=2e4
se_tstep=16.6666666666666







CFL estimates suggest:

dt_vis_tom*nu_top <= 1.7*2.5e5,

nu_top=2.5e5 needs hypervis_subcycle_tom=13 

F-EAMv1-AQP1:

  • H-128L:  dt=20/40/120/240 good (ran 15d, but then died with "cloud cover" errors

  • NH-128L: 

    • tstep_type=9:  dt=20/40/120/240: crash bad EOS 2.4days.  dt=18.75/37.5/150/300, ran 20days.

    • tstep_type=10: dt=20/40/120/240 crash 3.1days.  bad dp3d layer 75

FC5AV1C-H01B:  NH 128L run for 1 day with dt=18.75/37.5/150/300, then NaNs in microphysics (not yet debugged)

3km (NE1024)

dtime=100

dt_tracer=50
dt_remap=16.6666
dt_dyn=dt_vis=dt_vis_q=8.3333
dt_vis_tom=8.3333

nu_top=1e4
se_tstep=8.3333333333333





CFL estimates suggest:

dt_vis_tom*nu_top <= 0.43*2.5e5

with nu_top=2.5e5, 
hypervis_subcycle_tom=24



F-EAMv1-AQP1:

  • H-128L:  dt=10/20/60/120.  crashed with cloud_cover errors < 1d

  • NH-128L:  dt=9.375/18.75/75/150 ran 1d

FC5AV1C-H01B: 

SCREAM v0:  run for 40 days with constantHV, dt=9.375/18.75/75/75. 

SCREAM v0.1: switch to tensorHV (slightly less diffusion), needs dt_dyn<=9s, dt_tracer<60

RRM

dtime=???

dycore timesteps should be set based on the finest region in the RRM.  

nu_top=Uncertain - needs more research.  Should probably switch to tensor laplacian.  For NE30→NE120 grids, start with NE120 constant coefficient value, 1e5.  















PREQX Default Settings (for reference)

PREQX timesteps are controlled by namelist inputs:    nsplit, rsplit, qsplit, hypervis_subcycle, hypervis_subcycle_q

Resulting timesteps:  

  • dt_remap = dtime/nsplit

  • dt_tracer  = dt_remap/rsplit

  • dt_dyn      = dt_tracer/qsplit

  • dt_vis        = dt_dyn/hypervis_subcycle

  • dt_vis_tom  = dt_vis

  • dt_vis_q     = dt_tracer/hypervis_subcycle_q



se_ftype=2
se_limiter_option=8
cubed_sphere_map=0 (2 when using TR maps or RRM Grids)

transport_alg=0
vert_remap_q_alg=1        

nu_top = 2.5e5
nu_div = 2.5*nu
nu_p=nu_q=nu_s=nu



se_tstep=-1             ( timesteps set via se_nsplit, rsplit,  qsplit)
dt_remap_factor=-1
dt_tracer_factor=-1





Resolution

Timesteps

Namelist settings

Notes

Resolution

Timesteps

Namelist settings

Notes

1 degree (NE30)

dtime=1800  (ATM_NCPL=48)

dt_remap=900
dt_tracer=300
dt_dyn=300
dt_vis=100
dt_vis_q=300



nu=1e15    

se_nsplit=2 
rsplit=3 
qsplit=1  
hypervis_subcycle=3
hypervis_subcycle_tom=0  (not supported in PREQX)
hypervis_subcycle_q=6





hypervis and TOM sponge layer done together.   

With timesplit sponge layer, can we get dt_dyn_vis=300?









1/4 degree (NE120)

dtime=900  (ATM_NCPL=96)

dt_remap=150
dt_tracer=75
dt_dyn=75
dt_vis=18.75
dt_vis_q=75



nu=1.5e13

se_nsplit=6 
rsplit=2
qsplit=1 
hypervis_subcycle=4
hypervis_subcycle_tom=0
hypervis_subcycle_q=1



CFL estimates suggest:

hypervis_subcycle=3 should work.

if we timesplit out the sponge layer, we would probably need:

hypervis_subcycle=2, hypervis_subcycle_tom=3

which is unlikely to be more efficient.  



1/8 degree (NE240)

















Documentation on parameter tuning

Documentation on algorithms

Related pages

Related pages