Moving Beyond Bit-for-bit

Poster Title	Moving Beyond Bit-for-bit: Reproducibility Testing in E3SM
Authors	Joseph H. Kennedy (Unlicensed), Peter Caldwell, Kate Evans (Unlicensed), Salil Mahajan, Phil Rasch (pnl.gov), Andy Salinger, Balwinder Singh, Hui Wan
First Author	Joseph H. Kennedy (Unlicensed)
Session Type	E3SM/Integrated Sessions
Session ID	I6
Submission Type	Presentation
Group
Experiment
Poster Link

Abstract

Continuous testing of model changes is critical to developing a credible Earth system model. Code modification can impact a model’s results in three ways:

Changes continue to produce bit-for-bit identical solutions
Changes that cause numerical differences, yet produce a statistically identical climate
Changes that cause numerical differences and produce a different climate

Only in the third case must changes undergo an extensive review before being accepted into the model. However, E3SM does not yet have a robust way to distinguish between type 2 and type 3 changes. This results in a large development burden as all non-bit-for-bit changes must be treated as climate changing and undergo a time-intensive review process, often relying on subjective expert opinion.

Through the CMDV-SM project, a series of statistical climate reproducibility tests have been developed and are being evaluated for use in regular integration testing. These tests use a variety of approaches for testing whether numerical differences are also climate changing. The multivariate tests evaluate climate statistics of a ~1 year test ensemble against that of a baseline ensemble, by using several modern nonparametric (distribution-free) two-sample statistical tests for multivariate data to determine the equality of distributions. Alternatively, the perturbation growth test is a deterministic test method that determines if the divergence of atmospheric state between a baseline and a test ensemble within a single time step is larger than the growth of rounding-level initial perturbations. A large deviation after any parameterization update indicates both test failure and the specific parameterization causing the failure. Likewise, the time-step convergence test is another deterministic test method that determines if the test ensemble convergences to the baseline’s reference ensemble after ~10 minutes of simulation and hence can be considered equivalent to the baseline within numerical uncertainty.

These tests are being integrated into E3SM’s case control system CIME in order to provide a consistent workflow to developers and use EVV (Extended Verification and Validation for Earth System Models), a python based analysis package, to perform the statistical analyses. In addition to providing a pass-fail test result, EVV produces a portable website that details the test analysis, the obtained results, and any relevant figures (e.g., P-P plots). This allows clear and contextualized testing results to be quickly shared among developers when evaluating changes and will accelerate model development by allowing type 2 changes to be integrated rapidly.