#A12 How can we make model tuning less laborious and more transparent?

Poster Title	How can we make model tuning less laborious and more transparent? Putting human expertise in the loop in visualization design and expert judgement.
Authors	Susannah Burrows, Aritra Dasgupta (PNNL), Sarah Reehl (PNNL), Lisa Bramer (PNNL), Kyungsik ("Keith") Han (PNNL), Po-Lun Ma, Phil Rasch (pnl.gov), Yun Qian
Group	Atmosphere
Experiment
Poster Category	Future Direction
Submission Type	poster
Poster Link	Metrics_project_2017_ACME_spring_meeting.pptx

Abstract

Climate model calibration (“tuning”) is a complex and laborious task, requiring many thousands of hours of effort by experienced model developers for each new version of a component or coupled model. Two key obstacles to efficient model fidelity analysis are: (1) comparisons of model fidelity across multiple simulations are time-consuming, and (2) there is limited consensus on the relative importance of different model variables to evaluating overall fidelity. We address these challenges through a unique, close collaboration among climate modellers, visual analytics researchers, and statisticians, as part of an LDRD project at Pacific Northwest National Laboratory.

To address the first challenge, we have empirically tested four possible visual designs for the task of comparing model fidelity across many (>10) models and many (>10) variables (Dasgupta, et al., ACM SIGCHI, 2017). This study demonstrated that a visualization called the “slope plot” was preferred and perceived as more efficient for these tasks by climate model developers and users, compared with alternative visual displays (heatmaps, bar charts, and Taylor plots). This preference was exhibited by both highly experienced and less experienced scientists, and objective task completion accuracy was as good or better with the “slope plot” compared to other plots.

As a first step towards addressing the second challenge, we conducted a broad survey of the climate modelling community (with ca. 100 participants from four continents) to elicit expert judgments about the importance different variables in evaluating model fidelity. We present community-mean importance ratings for concise sets of variables. We quantify how community-mean importance ratings vary in response to the driving science question, quantify the degree of consensus on the importance of different model variables, and show that the distribution of responses does not differ significantly between less-experienced (median 10 y experience) and more-experienced (median 20.5 y experience) climate modelers. While we found a general lack of consensus with respect to importance ratings of particular variables, our results also suggest that once a certain level of expertise is acquired, additional experience in model evaluation currently may not necessarily lead to differences in assessments of model fidelity by individual experts.

In follow-on work, we are developing a visual analytic tool which will allow scientists to interactively adjust weights assigned to different model metrics and see how these impact model rankings, and to explore how model parameters impact model fidelity in a perturbed physics ensemble.

Dasgupta, Aritra, Susannah Burrows, Kyungsik Han, and Philip J. Rasch. "Empirical Analysis of the Subjective Impressions and Objective Measures of Domain Scientists' Visual Analytic Judgments." In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 1193-1204. ACM, 2017.