OP-E8.4 Moving Beyond Bit-for-bit
Abstract
Continuous testing of model changes is critical to developing a credible Earth system model. Code modification can impact a model’s results in three ways:
- Changes continue to produce bit-for-bit identical solutions
- Changes that cause numerical differences, yet produce a statistically identical climate
- Changes that cause numerical differences and produce a different climate
Only in the third case must changes undergo an extensive review before being accepted into the model. However, E3SM does not yet have a robust way to distinguish between type 2 and type 3 changes. This results in a large development burden as all non-bit-for-bit changes must be treated as climate changing and undergo a time-intensive review process, often relying on subjective expert opinion.
Through the CMDV-SM project, a series of statistical climate reproducibility tests have been developed and are being evaluated for use in regular integration testing. These tests use a variety of approaches for testing whether numerical differences are also climate changing. The multivariate tests evaluate climate statistics of a ~1 year test ensemble against that of a baseline ensemble, by using several modern nonparametric (distribution-free) two-sample statistical tests for multivariate data to determine the equality of distributions. Alternatively, the perturbation growth test is a deterministic test method that determines if the divergence of atmospheric state between a baseline and a test ensemble within a single time step is larger than the growth of rounding-level initial perturbations. A large deviation after any parameterization update indicates both test failure and the specific parameterization causing the failure. Likewise, the time-step convergence test is another deterministic test method that determines if the test ensemble convergences to the baseline’s reference ensemble after ~10 minutes of simulation and hence can be considered equivalent to the baseline within numerical uncertainty.
These tests are being integrated into E3SM’s case control system CIME in order to provide a consistent workflow to developers and use EVV (Extended Verification and Validation for Earth System Models), a python based analysis package, to perform the statistical analyses. In addition to providing a pass-fail test result, EVV produces a portable website that details the test analysis, the obtained results, and any relevant figures (e.g., P-P plots). This allows clear and contextualized testing results to be quickly shared among developers when evaluating changes and will accelerate model development by allowing type 2 changes to be integrated rapidly.