Page Comparison

...

Archive Acquisition

By a process often involving mystical elementsshrouded in mystery, you may learn of the need to acquire a simulation archive. You may hear voices indicating a confluence page, managed by a simulation project/group, detailing the runs conducted in a cloud of arcane verbiage, from which you must locate and decipher the following key treasure items

Campaign	Presently one of { BCG-v1, CRYO-v1, DECK-v1, HR-v1 }, although new campaigns are expected to appear
Model	Presently one of { 1_0, 1_1, 1_1_ECA, 1_2, 1_2_1, 1_3 }, although new models are expected to appear
Experiment(s)	Consulting tea leaves, burnt entrails, or the principals involved in conducting the simulations is recommended here. Do not expect the experiment name to support human comprehension in favor of compactness. Allow only the special characters { _ - } (underscore and hyphen) besides alphanumeric.
Ensemble	Usually “ens1”, although “ens2” through “ens5” have appeared in the night sky. These should be You may find “ensemble” cleverly embedded in the archive name, when ens[n] > ens1experiment name (e.g. Historical-H3, Projection-P5). These are generally excised from the experiment name in favor of an explicit “ensemble” variable assignment.
Archive Path(s)	This should be the NERSC HPSS path(s) to the zstash-formatted archive(s) associated with the four elements above. Take note of the archive_path_leaf_directory_name(s).

...

Dataset Type	Core File Pattern	Comment
atm nat mon	cam.h0
atm nat day	cam.h1
atm nat 6hr_snap	cam.h2
atm nat 6hr	cam.h3
atm nat 3hr_snap	cam.h2	BGC-only
atm nat 3hr	cam.h4
atm nat 3hr	cam.h3	BGC-only
atm nat day_cosp	cam.h5
lnd nat mon	clm2.h0
river nat mon	mosart.h0
ocn nat mon	mpaso.hist.am.timeSeriesStatsMonthly.	these are not “time-series” in the same sense as the post-process “regridded time series”ocn nat globalStats
mpaso.hist.am.globalStats.	ocn nat 5day	mpaso.hist.am.highFrequencyOutput.
sea-ice nat mon	mpascice.hist.am.timeSeriesStatsMonthly.	these are not “time-series” in the same sense as the post-process “regridded time series”
sea-ice nat day	mpascice.hist.am.timeSeriesStatsDaily.	these are not “time-series” in the same sense as the post-process “regridded time series”

...

On the other hand, if the tar-paths are equal, but it seems that variant filenames are found, there are both benign and difficult cases. A benign case exists where the files may have differing generation-dates, but are intended to be part of a a single finalized run, as in

...

grep -v HEAD headset_list_first_last > archive_dataset_map_prelim

STEP 3: Then run

~/bartoletti1/outbox/archive_path_mapper_stage3.sh > update_Archive_Map

You can test the correctness of the “update_Archive_Map” by invoking

~/bartoletti1/outbox/extract_archive_files_to.sh file_with_update_archive_map_line > file_list

For each line of the “update_Archive_Map” a file-list can be produced and manually examined to ensure that only the intended files are being addressed.

Once you are satisfied that update lines are correct, issue these commands to “install” the updated Archive_Map:

cat /p/user_pub/e3sm/archive/.cfg/Archive_Map update_Archive_Map | sort | uniq > temp_Archive_Map

mv temp_Archive_Map /p/user_pub/e3sm/archive/.cfg/Archive_Map

CAVEATE: The above Archive Path-Mapping operations do not ensure that a dataset is necessarily complete or clean. The dataset may still be missing files, or contain hidden restarts with extra and overlapping files. The path-mapping only intends to ensure that all of the files belonging to an intended “finalized run” are identified, for further analysis on coverage and cleanliness. (to be continued…)These latter activities are covered by the publication (or pre-publication) “Staging” activities.

Versions Compared

Old Version 7

New Version Current

Key

Archive Acquisition