Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

It is resolved that E3SM will maintain an LLNL-local copy of all production E3SM simulation run archives in zstash format, from which (near) immediate extraction of all production datasets will proceed to a pre-publication “warehouse” supporting faceted access to the data. The warehouse will have a means to indicate the “state” of the data, from {Assessment, Mediation (being cleaned, regridded, etc), Publishable}. Future publication activities will proceed from the warehouse, obviationg the need to retain the corresponding archives “on disk” if space becomes an issue. Specifically,

  • The LLNL E3SM Archives will retain the state of simulation output as released by the corresponding project groups (transferred “as is” from NERSC HPSS) in “zstash archive” format. Certain “mapping functions” will be conducted to identify the publishable content within the (often variably-structured) archives in order to facilitate survey and automated extraction of individual native-output datasets to the “faceted” pre-publication warehouse paths. The “maps” produced can be retained with the archives in the event that future access to the archives is required - however the goal is to avoid such need. Although initially kept on the local file system, as archives are “exhausted” (production materials warehoused or published) these archives can be pushed out to long-term tape storage, or eliminated, as policy dictates.

  • The LLNL E3SM Warehouse: Data not already published[*] will be prepared for publication by being processed “in-situ” (state in “Assessment” or “Mediation”), whereupon extracted from Archive to the “faceted” pre-publication Warehouse “v0” paths. These paths mirror the facets employed in the actual publication directories, and allow internal access and management “as if” published. The leaf-directory is labeled “v0”, to indicate it is the raw archive extraction, and has yet to undergo data validation checks and corrections for occasional data irregularities (missing data, overlapping data from unusual restarts) will be addressed, and default post processing (regridding for selected time-series, climatology generation) will proceed. Upon completion of Assessment/Post-Processing the data will reside in the staging “warehouse” directory (state “Puiblishable”), with faceted subdirectory locations exactly matching the eventual publication directory hierarchies. Where data . These post-processing steps result eventually in a “v1” path, indicating a status of “Publishable”. Publishable data that is not yet requested or authorized for publication, it will reside indefinitely in the warehouse.

    • [*] Data already published, for which publication errors are discovered, will be treated as unpublished data (pulled from archive, cleaned, repaired) and the most effective path to re-publication update will be engaged.

  • Publication of Warehouse datasets that are thereafter (or already) scheduled authorized for publication , need only be moved (relinked) to the corresponding publication directory path, mapfiles generated, and the formal ESGF publication process engaged.

  • Automation: With the exception of minor human inspection during the archive mapping process, and occasional unusual data-correction steps that may be needed in preparing warehouse v0 datasets for v1 status, the entire operation from archive extraction to ESGF publication is automated.

By this process, almost all queries regarding E3SM data production runs can be answered by tools that apply themselves (transparently if desired) to either their warehouse or publication location, largely obviating the continued dependence upon the “archive mapping functions” except where very archive-specific questions need to be addressed. The data footprint does not increase, since datasets pulled from archive will reside in exactly one of the two locations (warehouse or publication).

LLNL E3SM Archives

  • Data Location: /p/user_pub/e3sm/archives/<model>/<campaign>/<archive_directory>/

  • Config Location: /p/user_pub/e3sm/archives/.cfg/ (contains Archive_Locator, Archive_Map, and Standard_Dataset_Extraction_Patterns)

  • Guides and Tools: . . .

LLNL E3SM Staging “Warehouse”

...