2020-07-09 All-Hands Presentation Meeting Notes
Presenter: Salil Mahajan
Title:
Machine Learning Approaches to Ensure Statistical Reproducibility of Earth System Model Simulations
Abstract:
Effective utilization of novel hybrid architectures of pre-exascale and exascale machines re- quires transformations to global climate modeling systems that may not reproduce the original model solution bit-for-bit. Round-off level differences grow rapidly in these non-linear and chaotic systems. This makes it difficult to isolate bugs/errors from innocuous growth expected from round-off level differences. Here, we apply some classical and modern multivariate two sample equality of distribution tests, borrowed from the machine learning community, to evaluate statistical reproducibility of individual model components of US DOE’s Energy Exascale Earth System Model (E3SM) - namely the atmosphere (as part of the CMDV-SM project) and the ocean (as part of NGD Software and Algorithms project) models. Baseline simulation ensembles are compared to modified ensembles – after a non-bit-for-bit change in a model component is introduced – to evaluate the null hypothesis that the two ensembles are statistically indistinguishable. To quantify the false negative rates (Type II error rates) of these tests, we conduct a formal power analysis using targeted suites of short simulation ensembles. Each such suite contains several perturbed ensembles, each with a progressively different climate than the baseline ensemble - obtained by perturbing the magnitude of a single model tuning parameter in a controlled manner. The null hypothesis is evaluated for each of perturbed ensembles using these tests. This broad power analyses provides a framework to quantify the degree of differences that can be detected confidently by each of the tests for a given ensemble size (sample size). The knowledge of these false negative rates allows model developers using the tests to make an informed decision when accepting/rejecting an unintentional non-bit-for-bit change to the model solution. As a use case, these tests were recently used to establish that E3SMv1 production simulations conducted on NERSC’s Edison machine, then soon to be retired, were reproducible on NERSC’s Cori. Cori has since been used to conduct other E3SMv1 production runs including future projections.
Date
Time
- PT: 8:30 am
- ET: 11:30 am
Call Info
- web session: https://global.gotomeeting.com/join/570361173
- call number: (571) 317-3122 Access Code: 570-361-173, If busy, use alternate number: (773) 945-1029
Joining from a video-conferencing room or system? Dial: 67.217.95.2##570361173 , Cisco devices: 570361173@67.217.95.2
Attendees
- Salil Mahajan
- Sarat Sreepathi
- Mark Petersen
- Xylar Asay-Davis
- Ruby Leung
- Chris Golaz
- Stephen Price
- Forrest M. Hoffman (Unlicensed)
- Matt Hoffman
- Wuyin Lin
- Jon Wolfe
- Hailong Wang
- Min Xu
- Sally McFarlane
- Katherine Smith
- Renata McCoy
- LeAnn Conlon
- Sam Silva (Unlicensed)
- Peter Bogenschutz
- Mark Taylor
- Andrew Roberts
- Hyun-Gyu Kang
- Beth Drewniak
- Ryan Forsyth
- Peter Caldwell
- Jayesh Krishna
- Yilin Fang
- Mingxuan Wu
- Youngsung Kim
- Matt Turner (Unlicensed)
- Chris Terai
- Xue Zheng
- Michael Brunke
- Xiaoying Shi
- Mark Petersen
- Yan Feng
- Karthik Balaguru
- Kristin Hoch (Unlicensed)
- Shaocheng Xie
- Kyle Pressel (Unlicensed)
- Andy Salinger
- Xujing Jia Davis
- Amrapalli Garanaik (Unlicensed)
- daniel kaufman (Unlicensed)
- David C. Bader
- Bryce Harrop
- Jill Chengzhu Zhang
- Zeli Tan
- Yuanhao Fang (Unlicensed)
- Hui Wan
Presentation