NBFB test case study

During the V2 development process, we transitioned from CLUBB V1 to CLUBB V2. In CLUBB V2, a process controlled by c_k10 was modified to add finer control with c_k10 and c_k10h. The old behavior was recovered if c_k10h = c_k10 = 0.35. In V2 development, since c_k10h was a new parameter it was not set in the namelist and inheriting a default value of c_k10h = 1.0. This resulted in an unexpected degradation in the climate, which was not detected for several months. For more detailed background, see https://acme-climate.atlassian.net/wiki/spaces/CNCL/pages/3130818586

Can E3SM’s NBFB tests detect this difference? And how sensitive are they to changes in c_k10h?

Additional test (harder to perform, since CLUBB V1 is on in older code base): Will the NBFB tests consider simulations with CLUBB V1 (c_k10=0.35) statistically similar to CLUBB V2 (c_k10h=c_k10=0.35)?

Tests were run on Compy. All tests are first run with “-g” to generate baselines with E3SM master as of 2021/11/8. Then they are rerun with “-c” (compare to baseline), with various values of c_k10h

RESULTS: All tests pass with roundoff level changes to c_k10h, and fail (detect statistically different results) with small changes in c_k10h. With the default thresholds, PGN is the most sensitive (detecting a statistical difference with a 1e-8 change, followed by TSC at 1e-2, and MVK at 2e-2). These results are correlated with the timescale of each test, with PGN looking at physics columns after 1 timestep, TSC time step convergence with 300 timesteps, and MVK examining 1 year climatologies.

2022/6 update: TSC PASS/FAIL criterion still broken. ( https://github.com/E3SM-Project/E3SM/issues/4759 )

MVK

30 member ensemble of ~ 1 year simulations. Takes about 1.3 hours on 18 nodes.

Hack to reuse same “-c” case to run multiple experiments:

rm -f run/*.nc (otherwise we get PIO run time errors)
add “clubb_c_k10h=0.40” to user_nl_eam_???? files
./case.submit

MVK_P24x1.ne4_oQU240.F2010-CICE (Compy, 2011/11)

c_10kh	Test Result	TestStatus.log Metrics threshold=13

c_10kh	Test Result	TestStatus.log Metrics threshold=13
0.35 (default)	PASS
0.36	PASS	reject 7/121
0.38	FAIL	reject 21/121
0.40	FAIL	reject 50/121

MVK_P36x1.ne4_oQU240.F2010 (Anvil 2022/6)

(note: switch to F2010 case with MPAS sea ice)

c_10kh	Test Result	TestStatus.log Metrics threshold=13

c_10kh	Test Result	TestStatus.log Metrics threshold=13
0.35 (default)	PASS
0.36	PASS	reject 9/121
0.38	FAIL	reject 34/121

PGN

20 member ensemble of ~ 1 timestep simulations. Takes about 1 min on 16 nodes.

Hack to reuse same “-c” case to run multiple experiments:

rm -f run/*.nc (otherwise we get PIO run time errors)
add “clubb_c_k10h=0.40” to user_nl_eam_???? files
./case.submit

PGN_P32x1.ne4_oQU240.F2010 (Compy, 2011/11)

c_10kh	Test Result	TestStatus.log Metrics T test (t,p)

c_10kh	Test Result	TestStatus.log Metrics T test (t,p)
0.35 (default)	PASS	(0.000, 1.000)
0.350000001d0	PASS	(-1.424, 0.169)
0.35000001	FAIL	(-2.542, 0.019);
0.36	FAIL	(-12.564, 0.000);

PGN_P32x1.ne4_oQU240.F2010 (Anvil, 2022/6)

c_10kh	Test Result	TestStatus.log Metrics T test (t,p)

c_10kh	Test Result	TestStatus.log Metrics T test (t,p)
0.350000001d0	PASS	(-1.424, 0.169)
0.350000005d0	PASS	(-1.549, 0.136)
0.35000001	FAIL	(-2.542, 0.019)
0.35000002	FAIL	(-2.619, 0.016)

TSC

12 member ensemble of 5 day simulations. Takes about 10min on 11 nodes.

Hack to reuse a “-c” case to run multiple experiments used in MVK and PGN tests does not work. For TSC, during the RUN phase, the user_nl_eam_???? ensemble member namelists will be created anew by cime/scripts/lib/CIME/SystemTests/tsc.py. This script can be edited to append user_nl_eam to each user_nl_eam_???? file ( gist for tsc.py patch file ) , and then one can set parameters in user_nl_eam.

link to PASS/FAIL post processing script: https://github.com/LIVVkit/evv4esm/blob/master/evv4esm/extensions/tsc.py

TSC_P36x1.ne4_ne4.F2010-CICE (Compy, 2011/11)

c_10kh	Test Result (possible bug in scripts? fails for all values except when results are bfb)	@Hui Wan 's criterion: PASS, unless all points in p_min plot in [5min,10min] range are FAIL	TestStatus.log Metrics region by region results Global, Land, Ocean

c_10kh	Test Result (possible bug in scripts? fails for all values except when results are bfb)	@Hui Wan 's criterion: PASS, unless all points in p_min plot in [5min,10min] range are FAIL	TestStatus.log Metrics region by region results Global, Land, Ocean
0.35 (default)	PASS	PASS	PASS, PASS, PASS
0.350001	FAIL	PASS	PASS, PASS, PASS
0.35001	FAIL	PASS	FAIL, PASS, PASS pmin plot
0.3501	FAIL	PASS	PASS, PASS, PASS pmin plot
0.351	FAIL	PASS	PASS, FAIL, PASS pmin plot
0.36	FAIL	FAIL	FAIL, FAIL, FAIL pmin.36.png
0.37	FAIL	FAIL	FAIL, FAIL, FAIL pmin.37.png

TSC_P36x1.ne4_ne4.F2010-CICE (Anvil, 2022/6)

note: test aborts in MPAS sea ice if we use F2010

c_10kh	Test Result (possible bug in scripts? )	@Hui Wan 's criterion: PASS, unless all points in p_min plot in [5min,10min] range are FAIL

c_10kh	Test Result (possible bug in scripts? )	@Hui Wan 's criterion: PASS, unless all points in p_min plot in [5min,10min] range are FAIL
0.35 (default)	PASS	PASS
0.351	PASS	PASS
0.36	PASS	FAIL
0.38	PASS	FAIL

E3SM Documentation