NBFB test case study
During the V2 development process, we transitioned from CLUBB V1 to CLUBB V2. In CLUBB V2, a process controlled by c_k10 was modified to add finer control with c_k10 and c_k10h. The old behavior was recovered if c_k10h = c_k10 = 0.35. In V2 development, since c_k10h was a new parameter it was not set in the namelist and inheriting a default value of c_k10h = 1.0. This resulted in an unexpected degradation in the climate, which was not detected for several months. For more detailed background, see https://acme-climate.atlassian.net/wiki/spaces/CNCL/pages/3130818586
Can E3SM’s NBFB tests detect this difference? And how sensitive are they to changes in c_k10h?
Additional test (harder to perform, since CLUBB V1 is on in older code base): Will the NBFB tests consider simulations with CLUBB V1 (c_k10=0.35) statistically similar to CLUBB V2 (c_k10h=c_k10=0.35)?
Tests were run on Compy. All tests are first run with “-g” to generate baselines with E3SM master as of 2021/11/8. Then they are rerun with “-c” (compare to baseline), with various values of c_k10h
RESULTS: All tests pass with roundoff level changes to c_k10h, and fail (detect statistically different results) with small changes in c_k10h. With the default thresholds, PGN is the most sensitive (detecting a statistical difference with a 1e-8 change, followed by TSC at 1e-2, and MVK at 2e-2). These results are correlated with the timescale of each test, with PGN looking at physics columns after 1 timestep, TSC time step convergence with 300 timesteps, and MVK examining 1 year climatologies.
2022/6 update: TSC PASS/FAIL criterion still broken. ( https://github.com/E3SM-Project/E3SM/issues/4759 )
MVK
30 member ensemble of ~ 1 year simulations. Takes about 1.3 hours on 18 nodes.
Hack to reuse same “-c” case to run multiple experiments:
rm -f run/*.nc (otherwise we get PIO run time errors)
add “clubb_c_k10h=0.40” to user_nl_eam_???? files
./case.submit
MVK_P24x1.ne4_oQU240.F2010-CICE (Compy, 2011/11)
c_10kh | Test Result | TestStatus.log Metrics threshold=13 |
---|---|---|
0.35 (default) | PASS |
|
0.36 | PASS | reject 7/121 |
0.38 | FAIL | reject 21/121 |
0.40 | FAIL | reject 50/121 |
MVK_P36x1.ne4_oQU240.F2010 (Anvil 2022/6)
(note: switch to F2010 case with MPAS sea ice)
c_10kh | Test Result | TestStatus.log Metrics threshold=13 |
---|---|---|
0.35 (default) | PASS |
|
0.36 | PASS | reject 9/121 |
0.38 | FAIL | reject 34/121 |
PGN
20 member ensemble of ~ 1 timestep simulations. Takes about 1 min on 16 nodes.
Hack to reuse same “-c” case to run multiple experiments:
rm -f run/*.nc (otherwise we get PIO run time errors)
add “clubb_c_k10h=0.40” to user_nl_eam_???? files
./case.submit
PGN_P32x1.ne4_oQU240.F2010 (Compy, 2011/11)
c_10kh | Test Result | TestStatus.log Metrics T test (t,p) |
---|---|---|
0.35 (default) | PASS | (0.000, 1.000) |
0.350000001d0 | PASS | (-1.424, 0.169) |
0.35000001 | FAIL | (-2.542, 0.019); |
0.36 | FAIL | (-12.564, 0.000); |
PGN_P32x1.ne4_oQU240.F2010 (Anvil, 2022/6)
c_10kh | Test Result | TestStatus.log Metrics T test (t,p) |
---|---|---|
0.350000001d0 | PASS | (-1.424, 0.169) |
0.350000005d0 | PASS | (-1.549, 0.136) |
0.35000001 | FAIL | (-2.542, 0.019) |
0.35000002 | FAIL | (-2.619, 0.016) |
TSC
12 member ensemble of 5 day simulations. Takes about 10min on 11 nodes.
Hack to reuse a “-c” case to run multiple experiments used in MVK and PGN tests does not work. For TSC, during the RUN phase, the user_nl_eam_???? ensemble member namelists will be created anew by cime/scripts/lib/CIME/SystemTests/tsc.py. This script can be edited to append user_nl_eam to each user_nl_eam_???? file ( gist for tsc.py patch file ) , and then one can set parameters in user_nl_eam.
link to PASS/FAIL post processing script: https://github.com/LIVVkit/evv4esm/blob/master/evv4esm/extensions/tsc.py
TSC_P36x1.ne4_ne4.F2010-CICE (Compy, 2011/11)
c_10kh | Test Result (possible bug in scripts? fails for all values except when results are bfb) | @Hui Wan 's criterion: PASS, unless all points in p_min plot in [5min,10min] range are FAIL | TestStatus.log Metrics region by region results Global, Land, Ocean |
---|---|---|---|
0.35 (default) | PASS | PASS | PASS, PASS, PASS |
0.350001 | FAIL | PASS | PASS, PASS, PASS |
0.35001 | FAIL | PASS | FAIL, PASS, PASS pmin plot |
0.3501 | FAIL | PASS | PASS, PASS, PASS pmin plot |
0.351 | FAIL | PASS | PASS, FAIL, PASS pmin plot |
0.36 | FAIL | FAIL | FAIL, FAIL, FAIL pmin.36.png |
0.37 | FAIL | FAIL | FAIL, FAIL, FAIL |
TSC_P36x1.ne4_ne4.F2010-CICE (Anvil, 2022/6)
note: test aborts in MPAS sea ice if we use F2010
c_10kh | Test Result (possible bug in scripts? ) | @Hui Wan 's criterion: PASS, unless all points in p_min plot in [5min,10min] range are FAIL |
---|---|---|
0.35 (default) | PASS | PASS |
0.351 | PASS | PASS |
0.36 | PASS | FAIL |
0.38 | PASS | FAIL |