Data Publication Process

All data from production simulation campaigns will be released to the public by being published to distributed ESGF archives.

Once the simulation campaign is completed and the overview publication from those simulations is accepted, the Group Lead of the respected simulation campaign should send request to Exec and Infrastructure leadership to publish the data with the details of the data. The Infrastructure Group is responsible for publishing the data, the Scientists running the simulations are responsible for long-term archive of the data. The E3SM project requires everyone to use zstash to long-term archive their data. After this step is completed the Infrastructure Group will publish the data to ESGF.

Data Publication

The public can get the E3SM dat from up to 3 data sources: 1) ESGF under E3SM project, 2) ESGF under CMIP project, and 3) HPSS. All E3SM data is be available from the first source: ESGF under E3SM project. A subset of the data is also available from 2) and 3) source.

1) Default Data Publication to E3SM Project on ESGF

The default publication of the data involves publication of model output in the native file format on native grid from all E3SM components. See Default Set of Model Output for ESGF publication for details on default data steams that will be published to the dedicated E3SM project space on ESGF (https://esgf-node.llnl.gov/projects/e3sm/). 

The Group Leads or the Simulation Campaign Lead can request additional data to be published, for example data regridded to standard lon-lat Cartesian grid, or derived data like climatologies. Those request will be generally taken as a lower priority for publication.

The status and documentation on data published on in publication can be found at Status of Publication to ESGF under E3SM Project page on confluence.


A Point on Policy:  We will strive to vet the structural integrity of datasets we are presented for publication.  We have certain tools to test for time indexing issues, time gaps, and other problems occasionally discovered in a dataset, and take action to correct such issues, involving the data authors where necessary.  Where we lack the tools to conduct such checks, we will make a best-effort to publish the data "as is", where specifically requested.  In particular, when datasets are comprised of files across multiple archive or archive paths (due most often to major restarts such as runs interrupted by a change in available hardware), and there is overlap of identically-named files that differ in content, but address the same time-period, we will default to accept the "more recently creation-dated" files as comprising the intended dataset content.

2) CMORized Data Publication to CMIP Project on ESGF

A subset of data for which the simulation experiment belongs to some MIP (for example CMIP6, HiResMIP,...etc) can also be published as part of the intercomparison effort and will be published to a MIP project space in ESGF (https://esgf-node.llnl.gov/search/cmip6/?institution_id=E3SM-Project). In this case the data has to adhere to the strict formatting rules set by a MIP project. We use CMOR to rewrite the data to the desired standard, we call it CMORized output. The CMIP CMORized data is regridded and split to have one variable per file, according to CMIP requirement, see 

3) Long-Term Archive on HPSS

 Full data from simulations is archive for long-term storage using zstash and either stored on HPSS system (if available) or will be zstashed and moved to LLNL to E3SM's petabyte storage system for the long-term archive. Some of this data is being made world-readable, and the users who have accounts on those system can access the data there. 

Released E3SM Data

See Publicly Available Data on ESGF or go to the E3SM public website https://e3sm.org/data/get-e3sm-data/.


Children Pages