...
Archive Acquisition
By a process often involving mystical elementsshrouded in mystery, you may learn of the need to acquire a simulation archive. You may hear voices indicating a confluence page, managed by a simulation project/group, detailing the runs conducted in a cloud of arcane verbiage, from which you must locate and decipher the following key treasure items
Campaign | Presently one of { BCG-v1, CRYO-v1, DECK-v1, HR-v1 }, although new campaigns are expected to appear |
---|---|
Model | Presently one of { 1_0, 1_1, 1_1_ECA, 1_2, 1_2_1, 1_3 }, although new models are expected to appear |
Experiment(s) | Consulting tea leaves, burnt entrails, or the principals involved in conducting the simulations is recommended here. Do not expect the experiment name to support human comprehension in favor of compactness. Allow only the special characters { _ - } (underscore and hyphen) besides alphanumeric. |
Ensemble | Usually “ens1”, although “ens2” through “ens5” have appeared in the night sky. These should be You may find “ensemble” cleverly embedded in the archive name, when ens[n] > ens1experiment name (e.g. Historical-H3, Projection-P5). These are generally excised from the experiment name in favor of an explicit “ensemble” variable assignment. |
Archive Path(s) | This should be the NERSC HPSS path(s) to the zstash-formatted archive(s) associated with the four elements above. Take note of the archive_path_leaf_directory_name(s). |
...
Dataset Type | Core File Pattern | Comment | |
atm nat mon | *cam.h0* | ||
atm nat day | *cam.h1* | ||
atm nat 6hr_snap | *cam.h2* | ||
atm nat 6hr | *cam.h3* | ||
atm nat 3hr_snap | *cam.h2* | BGC-only | |
atm nat 3hr | *cam.h4* | ||
atm nat 3hr | *cam.h3* | BGC-only | |
atm nat day_cosp | *cam.h5* | ||
lnd nat mon | *clm2.h0* | ||
river nat mon | *mosart.h0* | ||
ocn nat mon | *mpaso.hist.am.timeSeriesStatsMonthly.* | these are not “time-series” in the same sense as the post-process “regridded time series”ocn nat globalStats | |
*mpaso.hist.am.globalStats.* | ocn nat 5day | *mpaso.hist.am.highFrequencyOutput.* | |
sea-ice nat mon | *mpascice.hist.am.timeSeriesStatsMonthly.* | these are not “time-series” in the same sense as the post-process “regridded time series” | |
sea-ice nat day | *mpascice.hist.am.timeSeriesStatsDaily.* | these are not “time-series” in the same sense as the post-process “regridded time series” |
...
On the other hand, if the tar-paths are equal, but it seems that variant filenames are found, there are both benign and difficult cases. A benign case exists where the files may have differing generation-dates, but are intended to be part of a a single finalized run, as in
...
grep -v HEAD headset_list_first_last > archive_dataset_map_prelim
STEP 3: Then run
~/bartoletti1/outbox/archive_path_mapper_stage3.sh > update_Archive_Map
You can test the correctness of the “update_Archive_Map” by invoking
~/bartoletti1/outbox/extract_archive_files_to.sh file_with_update_archive_map_line > file_list
For each line of the “update_Archive_Map” a file-list can be produced and manually examined to ensure that only the intended files are being addressed.
Once you are satisfied that update lines are correct, issue these commands to “install” the updated Archive_Map:
cat /p/user_pub/e3sm/archive/.cfg/Archive_Map update_Archive_Map | sort | uniq > temp_Archive_Map
mv temp_Archive_Map /p/user_pub/e3sm/archive/.cfg/Archive_Map
CAVEATE: The above Archive Path-Mapping operations do not ensure that a dataset is necessarily complete or clean. The dataset may still be missing files, or contain hidden restarts with extra and overlapping files. The path-mapping only intends to ensure that all of the files belonging to an intended “finalized run” are identified, for further analysis on coverage and cleanliness. (to be continued…)These latter activities are covered by the publication (or pre-publication) “Staging” activities.