Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

It is resolved that E3SM will maintain an LLNL-local copy of all production E3SM simulation run archives in zstash format, from which (near) immediate extraction of all production datasets will proceed to a pre-publication “warehouse” supporting faceted access to the data. The warehouse will have a means to indicate the “state” of the data, from {Assessment, Mediation (being cleaned, regridded, etc), Publishable}. Future publication activities will proceed from the warehouse, obviationg the need to retain the corresponding archives “on disk” if space becomes an issue. Specifically,

  • The LLNL E3SM Archives will retain the state of simulation output as released by the corresponding project groups (transferred “as is” from NERSC HPSS). Certain “mapping functions” will be conducted to identify the publishable content within the (often variably-structured) archives in order to facilitate survey and extraction of individual native-output datasets to the “faceted” pre-publication warehouse paths. The “maps” produced can be retained with the archives in the event that future access to the archives is required - however the goal is to avoid such need. Although initially kept on the local file system, as archives are “exhausted” (production materials warehoused or published) these archives can be pushed out to long-term tape storage, or eliminated, as policy dictates.

  • Data not already published[*] will be prepared for publication by being processed “in-situ” (state in “Assessment” or “Post-Processing”), whereupon occasional data irregularities (missing data, overlapping data from unusual restarts) will be addressed, and default post processing (regridding for selected time-series, climatology generation) will proceed. Upon completion of Assessment/Post-Processing the data will reside in the staging “warehouse” directory, with faceted subdirectory locations exactly matching the eventual publication directory hierarchies. Where data is not yet requested for publication, it will reside indefinitely in the pre-publication warehouse.

    • [*] Data already published, for which publication errors are discovered, will be treated as unpublished data (pulled from archive, cleaned, repaired) and the most effective update path to re-publication update will be engaged.)

  • Where new data is thereafter (or already) scheduled for publication, it need only be moved (relinked) to the corresponding publication directory path, mapfile generated, and the formal ESGF publication process engaged.

By this process, almost all queries regarding E3SM data production runs can be answered by tools that apply themselves (transparently if desired) to either their pre-publication or publication location, largely obviating the continued dependence upon the “archive mapping functions” except where very archive-specific questions need to be addressed. The data footprint does not increase, since datasets pulled from archive will reside in exactly one of the two locations (warehouse and publication).

LLNL E3SM Archives

  • Location: /p/user_pub/e3sm/archives/<model>/<campaign>/<archive_directory>/

  • Guides and Tools: . . .

LLNL E3SM Staging “Warehouse”

  • Location: /p/user_pub/e3sm/staging/prepub/<faceted_dataset_directory>/

  • Guides and Tools: . . .

LLNL E3SM Publication

  • Location: /p/user_pub/work/E3SM/<faceted_dataset_directory>/

  • Guides and Tools: . . .

  • No labels