2020-07-09 All-Hands Presentation Meeting Notes

PresenterSalil Mahajan

Title:

Machine Learning Approaches to Ensure Statistical Reproducibility of Earth System Model Simulations


Abstract:

Effective utilization of novel hybrid architectures of pre-exascale and exascale machines re- quires transformations to global climate modeling systems that may not reproduce the original model solution bit-for-bit. Round-off level differences grow rapidly in these non-linear and chaotic systems. This makes it difficult to isolate bugs/errors from innocuous growth expected from round-off level differences. Here, we apply some classical and modern multivariate two sample equality of distribution tests, borrowed from the machine learning community, to evaluate statistical reproducibility of individual model components of US DOE’s Energy Exascale Earth System Model (E3SM) - namely the atmosphere (as part of the CMDV-SM project) and the ocean (as part of NGD Software and Algorithms project) models. Baseline simulation ensembles are compared to modified ensembles – after a non-bit-for-bit change in a model component is introduced – to evaluate the null hypothesis that the two ensembles are statistically indistinguishable. To quantify the false negative rates (Type II error rates) of these tests, we conduct a formal power analysis using targeted suites of short simulation ensembles. Each such suite contains several perturbed ensembles, each with a progressively different climate than the baseline ensemble - obtained by perturbing the magnitude of a single model tuning parameter in a controlled manner. The null hypothesis is evaluated for each of perturbed ensembles using these tests. This broad power analyses provides a framework to quantify the degree of differences that can be detected confidently by each of the tests for a given ensemble size (sample size). The knowledge of these false negative rates allows model developers using the tests to make an informed decision when accepting/rejecting an unintentional non-bit-for-bit change to the model solution. As a use case, these tests were recently used to establish that E3SMv1 production simulations conducted on NERSC’s Edison machine, then soon to be retired, were reproducible on NERSC’s Cori. Cori has since been used to conduct other E3SMv1 production runs including future projections.

Date

 

Time

  • PT: 8:30 am
  • ET: 11:30 am

Call Info

  • web session:   https://global.gotomeeting.com/join/570361173                    
  • call number:    (571) 317-3122 Access Code: 570-361-173,            If busy, use alternate number: (773) 945-1029

    Joining from a video-conferencing room or system?       Dial: 67.217.95.2##570361173 ,  Cisco devices: 570361173@67.217.95.2 

Attendees


Presentation

Time
Title
Presenter
Presentation
Recording
Notes

30 min


Machine Learning Approaches to Ensure Statistical Reproducibility of Earth System Model SimulationsSalil Mahajan