Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

This variable is problematic in some versions of E3SM v2.1 and pre-release versions of E3SMv3. Here is a summary of the how the problem appears and how to fix it.

Due to past versions of E3SM initializing variables to NaNs, simulations that use older restart files will still be initializing variables, like TWS_MONTH_BEGIN, to NaN which may cause runtime errors (examples of how these errors may appear in e3sm.log.* files are included below).

Solution

  • Ensure that the initial fix is in your branch (here: commit with fix ).

  • If your simulation uses an initial condition file for land (finidat), replace the NaNs in TWS_MONTH_BEGIN with the fill value of 1.e+36. Below are methods to perform the conversion:

    • Using NCO functions:

      ncatted -a _FillValue,TWS_MONTH_BEGIN,o,f,NaN ${infile.nc}
      ncatted -a _FillValue,TWS_MONTH_BEGIN,m,f,1.0e36 ${infile.nc}
    • Using a Python script:

      from netCDF4 import Dataset
      import numpy as np
      
      ofile = Dataset('infile.nc','r+')
      var_array = f.variables['TWS_MONTH_BEGIN']
      var_array[:][np.isnan(var_array[:])] = 1.e+36
      ofile.close()

E3SM v2.1

E3SMv3

Example backtrace due to floating point exception

  1.  12: forrtl: error (65): floating invalid
     12: Image              PC                Routine            Line        Source
     12: libpthread-2.31.s  000014E799803910  Unknown               Unknown  Unknown
     12: e3sm.exe           0000000004F6368D  subgridavemod_mp_        1045  subgridAveMod.F90
     12: e3sm.exe           000000000663C0DC  waterbudgetmod_mp         719  WaterBudgetMod.F90
     12: e3sm.exe           0000000004A4E29E  elm_driver_mp_elm         576  elm_driver.F90
     12: e3sm.exe           00000000049F29F0  lnd_comp_mct_mp_l         506  lnd_comp_mct.F90
     12: e3sm.exe           0000000000496175  component_mod_mp_         751  component_mod.F90
     12: e3sm.exe           00000000004583F7  cime_comp_mod_mp_        2876  cime_comp_mod.F90
     12: e3sm.exe           000000000047EB62  MAIN__                    153  cime_driver.F90
     12: e3sm.exe           000000000042342D  Unknown               Unknown  Unknown
     12: libc-2.31.so       000014E79923E24D  __libc_start_main     Unknown  Unknown
    

Example runtime fail while writing history file

896: PIO: FATAL ERROR: Aborting... An error occured, Writing variables (number of variables = 180) to file (./E3SM.2023-SCIDAC.ne30pg2_EC30to60E2r2.AMIP.EF_0.13.CF_22.HD_0.56.elm.h0.1984-01.nc, ncid=150) using PIO_IOTYPE_PNETCDF iotype failed. Non blocking write for variable (TWS_MONTH_BEGIN, varid=206) failed (Number of subarray requests/regions=1, Size of data local to this process = 982). NetCDF: Numeric conversion not representable (err=-60). Aborting since the error handler was set to PIO_INTERNAL_ERROR... (/global/u1/w/whannah/E3SM/E3SM_SRC2/externals/scorpio/src/clib/pio_darray_int.c: 395)

History

  • Tests failed restart comparison due to missing TWS_MONTH_BEGIN restart variable (Issue #4649)

  • Longer tests that restart at beginning of the month failed restart comparison due to col_ws%endwb not being on the restart file. (Issue #5079)

  • The initial condition for a production test had to be converted from NaNs (PR #5811)

  • No labels