Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Current »

                    

Poster Title

Moving Beyond Bit-for-bit: Reproducibility Testing in E3SM

AuthorsJoseph H. Kennedy (Unlicensed), Peter Caldwell, Kate Evans (Unlicensed), Salil Mahajan, Phil Rasch (pnl.gov), Andy Salinger, Balwinder Singh, Hui Wan
First Author
Session TypeE3SM/Integrated Sessions
Session IDI6, E8
Submission TypePresentation
Group
Experiment
Poster Link




Abstract

Continuous testing of model changes is critical to developing a credible Earth system model. Code modification can impact a model’s results in three ways:

  1. Changes continue to produce bit-for-bit identical solutions
  2. Changes that cause numerical differences, yet produce a statistically identical climate
  3. Changes that cause numerical differences and produce a different climate

Only in the third case must changes undergo an extensive review before being accepted into the model. However, E3SM does not yet have a robust way to distinguish between type 2 and type 3 changes. This results in a large development burden as all non-bit-for-bit changes must be treated as climate changing and undergo a time-intensive review process, often relying on subjective expert opinion.

Through the CMDV-SM project, a series of statistical climate reproducibility tests have been developed and are being evaluated for use in regular integration testing. These tests use a variety of approaches for testing whether numerical differences are also climate changing. The multivariate tests evaluate climate statistics of a ~1 year test ensemble against that of a baseline ensemble, by using several modern nonparametric (distribution-free) two-sample statistical tests for multivariate data to determine the equality of distributions. Alternatively, the perturbation growth test is a deterministic test method that determines if the divergence of atmospheric state between a baseline and a test ensemble within a single time step is larger than the growth of rounding-level initial perturbations. A large deviation after any parameterization update indicates both test failure and the specific parameterization causing the failure.  Likewise, the time-step convergence test is another deterministic test method that determines if the test ensemble convergences to the baseline’s reference ensemble after ~10 minutes of simulation and hence can be considered equivalent to the baseline within numerical uncertainty.

These tests are being integrated into E3SM’s case control system CIME in order to provide a consistent workflow to developers and use EVV (Extended Verification and Validation for Earth System Models), a python based analysis package, to perform the statistical analyses. In addition to providing a pass-fail test result, EVV produces a portable website that details the test analysis, the obtained results, and any relevant figures (e.g., P-P plots). This allows clear and contextualized testing results to be quickly shared among developers when evaluating changes and will accelerate model development by allowing type 2 changes to be integrated rapidly.


  • No labels