The E3SM Long-Term Archive at LLNL intends to be a permanent repository, faithfully representing the output of all E3SM simulations, and thereafter become the one source of data for all future LLNL E3SM publications. A “plurality” of opinion exists to keep the archive “as is”, for the forensic value it may hold in understanding the actual behavior of the simulation models and their operations. The archives are in zstash format. Datasets slated for publication are soon zstash-extracted to a (faceted) warehouse location for pre-publication assessment, potential repairs, validation, post-processing, and publication.

It is an unfortunate fact that many of these archives, although nominally in tar-zstash format, employ non-standard filenames and tar-paths, some specific to campaign. Within a single archive, a “standard datatype pattern” (e.g. “*cam.h0.*nc“) may match multiple lists of files, none beginning with the recommended “atm/hist/”, and requiring manual inspection in order to “infer” which list of files are those intended for publication. Also, (due in part to major unscheduled restarts) many datasets are spread across multiple separate zstash archives. Any attempt to automate the extraction (or re-extraction) of an archived dataset is therefore stymied without the presence of a “map” leading to the various and multiple paths required to collect files for publication.

Here is the latest Archive_Map:



Prior publication operations would involve, for each dataset to be published