NBFB test case study

During the V2 development process, we transitioned from CLUBB V1 to CLUBB V2. In CLUBB V2, a process controlled by c_k10 was modified to add finer control with c_k10 and c_k10h. The old behavior was recovered if c_k10h = c_k10 = 0.35. In V2 development, since c_k10h was a new parameter it was not set in the namelist and inheriting a default value of c_k10h = 1.0. This resulted in an unexpected degradation in the climate, which was not detected for several months. For more detailed background, see https://acme-climate.atlassian.net/wiki/spaces/CNCL/pages/3130818586

Can E3SM’s NBFB tests detect this difference? And how sensitive are they to changes in c_k10h?

Additional test (harder to perform, since CLUBB V1 is on in older code base): Will the NBFB tests consider simulations with CLUBB V1 (c_k10=0.35) statistically similar to CLUBB V2 (c_k10h=c_k10=0.35)?

Tests were run on Compy. All tests are first run with “-g” to generate baselines with E3SM master as of 2021/11/8. Then they are rerun with “-c” (compare to baseline), with various values of c_k10h

RESULTS: All tests pass with roundoff level changes to c_k10h, and fail (detect statistically different results) with small changes in c_k10h. With the default thresholds, PGN is the most sensitive (detecting a statistical difference with a 1e-8 change, followed by TSC at 1e-2, and MVK at 2e-2). These results are correlated with the timescale of each test, with PGN looking at physics columns after 1 timestep, TSC time step convergence with 300 timesteps, and MVK examining 1 year climatologies.

2022/6 update: TSC PASS/FAIL criterion still broken. ( https://github.com/E3SM-Project/E3SM/issues/4759 )

 

MVK

30 member ensemble of ~ 1 year simulations. Takes about 1.3 hours on 18 nodes.

Hack to reuse same “-c” case to run multiple experiments:

  • rm -f run/*.nc (otherwise we get PIO run time errors)

  • add “clubb_c_k10h=0.40” to user_nl_eam_???? files

  • ./case.submit

MVK_P24x1.ne4_oQU240.F2010-CICE (Compy, 2011/11)

c_10kh

Test Result

TestStatus.log Metrics

threshold=13

c_10kh

Test Result

TestStatus.log Metrics

threshold=13

0.35 (default)

PASS

 

0.36

PASS

reject 7/121

0.38

FAIL

reject 21/121

0.40

FAIL

reject 50/121

MVK_P36x1.ne4_oQU240.F2010 (Anvil 2022/6)

(note: switch to F2010 case with MPAS sea ice)

c_10kh

Test Result

TestStatus.log Metrics

threshold=13

c_10kh

Test Result

TestStatus.log Metrics

threshold=13

0.35 (default)

PASS

 

0.36

PASS

reject 9/121

0.38

FAIL

reject 34/121

 

PGN

20 member ensemble of ~ 1 timestep simulations. Takes about 1 min on 16 nodes.

Hack to reuse same “-c” case to run multiple experiments:

  • rm -f run/*.nc (otherwise we get PIO run time errors)

  • add “clubb_c_k10h=0.40” to user_nl_eam_???? files

  • ./case.submit

PGN_P32x1.ne4_oQU240.F2010 (Compy, 2011/11)

c_10kh

Test Result

TestStatus.log Metrics

T test (t,p)

c_10kh

Test Result

TestStatus.log Metrics

T test (t,p)

0.35 (default)

PASS

(0.000, 1.000)

0.350000001d0

PASS

(-1.424, 0.169)

0.35000001

FAIL

(-2.542, 0.019);

0.36

FAIL

(-12.564, 0.000);

PGN_P32x1.ne4_oQU240.F2010 (Anvil, 2022/6)

c_10kh

Test Result

TestStatus.log Metrics

T test (t,p)

c_10kh

Test Result

TestStatus.log Metrics

T test (t,p)

0.350000001d0

PASS

(-1.424, 0.169)

0.350000005d0

PASS

(-1.549, 0.136)

0.35000001

FAIL

(-2.542, 0.019)

0.35000002

FAIL

(-2.619, 0.016)



 

TSC

12 member ensemble of 5 day simulations. Takes about 10min on 11 nodes.

Hack to reuse a “-c” case to run multiple experiments used in MVK and PGN tests does not work. For TSC, during the RUN phase, the user_nl_eam_???? ensemble member namelists will be created anew by cime/scripts/lib/CIME/SystemTests/tsc.py. This script can be edited to append user_nl_eam to each user_nl_eam_???? file ( gist for tsc.py patch file ) , and then one can set parameters in user_nl_eam.

link to PASS/FAIL post processing script: https://github.com/LIVVkit/evv4esm/blob/master/evv4esm/extensions/tsc.py

TSC_P36x1.ne4_ne4.F2010-CICE (Compy, 2011/11)

c_10kh

Test Result (possible bug in scripts? fails for all values except when results are bfb)

@Hui Wan 's criterion: PASS, unless all points in p_min plot in [5min,10min] range are FAIL

TestStatus.log Metrics

region by region results

Global, Land, Ocean

c_10kh

Test Result (possible bug in scripts? fails for all values except when results are bfb)

@Hui Wan 's criterion: PASS, unless all points in p_min plot in [5min,10min] range are FAIL

TestStatus.log Metrics

region by region results

Global, Land, Ocean

0.35 (default)

PASS

PASS

PASS, PASS, PASS

0.350001

FAIL

PASS

PASS, PASS, PASS

0.35001

FAIL

PASS

FAIL, PASS, PASS pmin plot

0.3501

FAIL

PASS

PASS, PASS, PASS pmin plot

0.351

FAIL

PASS

PASS, FAIL, PASS pmin plot

0.36

FAIL

FAIL

FAIL, FAIL, FAIL pmin.36.png

0.37

FAIL

FAIL

FAIL, FAIL, FAIL

pmin.37.png

TSC_P36x1.ne4_ne4.F2010-CICE (Anvil, 2022/6)

note: test aborts in MPAS sea ice if we use F2010

c_10kh

Test Result (possible bug in scripts? )

@Hui Wan 's criterion: PASS, unless all points in p_min plot in [5min,10min] range are FAIL

c_10kh

Test Result (possible bug in scripts? )

@Hui Wan 's criterion: PASS, unless all points in p_min plot in [5min,10min] range are FAIL

0.35 (default)

PASS

PASS

0.351

PASS

PASS

0.36

PASS

FAIL

0.38

PASS

FAIL