NBFB test case study

During the V2 development process, we transitioned from CLUBB V1 to CLUBB V2. In CLUBB V2, a process controlled by c_k10 was modified to add finer control with c_k10 and c_k10h. The old behavior was recovered if c_k10h = c_k10 = 0.35. In V2 development, since c_k10h was a new parameter it was not set in the namelist and inheriting a default value of c_k10h = 1.0. This resulted in an unexpected degradation in the climate, which was not detected for several months. For more detailed background, see V2 Case Studies

Can E3SM’s NBFB tests detect this difference? And how sensitive are they to changes in c_k10h?

Additional test (harder to perform, since CLUBB V1 is on in older code base): Will the NBFB tests consider simulations with CLUBB V1 (c_k10=0.35) statistically similar to CLUBB V2 (c_k10h=c_k10=0.35)?

Tests were run on Compy. All tests are first run with “-g” to generate baselines with E3SM master as of 2021/11/8. Then they are rerun with “-c” (compare to baseline), with various values of c_k10h

RESULTS: All tests pass with roundoff level changes to c_k10h, and fail (detect statistically different results) with small changes in c_k10h. With the default thresholds, PGN is the most sensitive (detecting a statistical difference with a 1e-8 change, followed by TSC at 1e-3, and MVK at 2e-2). These results are correlated with the timescale of each test, with PGN looking at physics columns after 1 timstep, TSC time step convergence with 300 timesteps, and MVK examining 1 year climatologies.

MVK_P24x1.ne4_oQU240.F2010-CICE

30 member ensemble of ~ 1 year simulations. Takes about 1.3 hours on 18 nodes.

c_10kh	Test Result	TestStatus.log Metrics threshold=13
0.35 (default)	PASS
0.36	PASS	reject 7/121
0.38	FAIL	reject 21/121
0.40	FAIL	reject 50/121

Hack to reuse same “-c” case to run multiple experiments:

rm -f run/*.nc (otherwise we get PIO run time errors)
add “clubb_c_k10h=0.40” to user_nl_eam_???? files
./case.submit

PGN_P32x1.ne4_oQU240.F2010

20 member ensemble of ~ 1 timestep simulations. Takes about 1 min on 16 nodes.

c_10kh	Test Result	TestStatus.log Metrics T test (t,p)
0.35 (default)	PASS	(0.000, 1.000)
0.350000001d0	PASS	(-1.424, 0.169)
0.35000001	FAIL	(-2.542, 0.019);
0.36	FAIL	(-12.564, 0.000);

Hack to reuse same “-c” case to run multiple experiments:

rm -f run/*.nc (otherwise we get PIO run time errors)
add “clubb_c_k10h=0.40” to user_nl_eam_???? files
./case.submit

TSC_P36x1.ne4_ne4.F2010-CICE

12 member ensemble of 5 day simulations. Takes about 10min on 11 nodes.

c_10kh	Test Result (possible bug in scripts? fails for all values except when results are bfb)	Alternative test result: PASS = all values in P_min plot > PASS threshold	TestStatus.log Metrics region by region results Global, Land, Ocean
0.35 (default)	PASS	PASS	PASS, PASS, PASS
0.350001	FAIL	PASS	PASS, PASS, PASS
0.35001	FAIL	PASS	FAIL, PASS, PASS pmin plot
0.3501	FAIL	PASS	PASS, PASS, PASS pmin plot
0.351	FAIL	FAIL	PASS, FAIL, PASS pmin plot
0.36	FAIL	FAIL	FAIL, FAIL, FAIL

Hack to reuse a “-c” case to run multiple experiments used in MVK and PGN tests does not work. For TSC, during the RUN phase, the user_nl_eam_???? ensemble member namelists will be created anew by cime/scripts/lib/CIME/SystemTests/tsc.py. This script can be edited to append user_nl_eam to each user_nl_eam_???? file, and then one can set parameters in user_nl_eam.

link to PASS/FAIL post processing script:

https://github.com/LIVVkit/evv4esm/blob/master/evv4esm/extensions/tsc.py