Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Daniel Ricciuto, Dan Lu, Khachik Sargsyan, Vishagan Ratnaswamy Ratnaswamy, Cosmin Safta (Unlicensed)

Abstract

There are a variety of different methods in machine learning that can be applied to create surrogate models.  Traditional feed-forward neural networks or a multilayer perceptron (MLP) can be used to build approximations to quantities of interest (QoI) for complex physical models, for example, carbon fluxes in the E3SM land model.  A single model output variable (e.g. the gross primary productivity GPP) is spatially gridded and therefore contains a large number of QoIs for a surrogate model to reproduce.  Here we demonstrate this high-dimensional GPP output can be accurately represented with a small number of singular values when singular value decomposition (SVD) is applied. A relatively An accurate surrogate model can then be trained using a MLP with a relative relatively small ensemble.  Temporal variations in model outputs present additional challenges for creating accurate surrogate models.  Thus, the use of a recurrent neural network (RNN) is also suited for the land model. Using a vanilla RNN comes with its own set of issues such as exploding and vanishing gradients; however, those issues can be mitigated with gradient clipping or commonly gates. One common gated method is long short-term memory (LSTM).  While the gated-RNN can handle temporal data, it is typically done in a  sequential fashion, i.e. it ignores the connected (hierarchical) nature of the QOIs. To make a more physics-based model, we employ a hierarchical NN, specifically a Tree-LSTM that incorporates the hierarchical nature of the land model.  We compare how well the Tree-LSTM RNN predicted the QOIs of the land model in one representative grid cell, namely for carbon cycle variables compared with LSTM-RNN and MLP.  We find that the Tree-LSTM outperforms MLP and LSTM-RNN, confirming the intuition that physics-based neural network architecture improves the predictive accuracy compared to vanilla methods.