...
Campaign | Presently one of { BCG-v1, CRYO-v1, DECK-v1, HR-v1 }, although new campaigns are expected to appear |
---|---|
Model | Presently one of { 1_0, 1_1, 1_1_ECA, 1_2 }, although new models are expected to appear |
Experiment(s) | Consulting tea leaves, burnt entrails, or the principals involved in conducting the simulations is recommended here. Do not expect the experiment name to support human comprehension in favor of compactness. |
Ensemble | Usually “ens1”, although “ens2” through “ens5” have appeared in the night sky. These must should be embedded in the experiment archive name, when ens[n] > ens1 |
Archive Path(s) | This should be the NERSC HPSS path(s) to the zstash-formatted archive(s) associated with the four elements above. Take note of the archive_path_leaf_directory_name(s). |
With the above 5 elements in hand, you can craft the receiving archive directory or directories, “/p/user_pub/e3sm/archive/model/campaign/archive_path_leaf_directory_name(s).
...
Here, it seems clear that we want to add “atm/hist” to the file search pattern. But in many cases there are multiple plausible internal tar-paths to matching files. In order to help automate the necessary disambiguity disambiguation …
STEP 2: We run
~/bartoletti1/outbox/archive_path_mapper_stage2.sh
...
For the filenames that remain, output is produced (named “headset_list_first_last”) that lists the first and last filename found in the residual filelistsfile-lists. For the set of seven BGC-v1,ens1,hist_BCRD files listed above, that output was
...
This is what we want to see. The remaining “first and last” file in each set has a consistent and reasonable tar path, and the filenames match expect except for the “sim-date” field. For instance, consider the last 3 lines of the output file above. Edit the categorical line
1:BGC-v1:1_1:hist-BCRD:ens1:sea-ice_nat_mon
by appending everything after “HEADF” of the “first file found” line (including the separating colon), but wild-carding the sim-date field “2007-01-01” to obtain the line
1:BGC-v1:1_1:hist-BCRD:ens1:sea-ice_nat_mon:ice/hist/mpascice.hist.am.timeSeriesStatsMonthly.*.nc
(In truth, the pattern “ice/hist/*.nc” would suffice, as it seems only the correct files appear in the listing. including “mpascice.hist.am.timeSeriesStatsMonthly
“ adds assurance.)
When things are not so smooth:
If instead we discovered that “first” and “last” did not match, we have two possible courses of action. If it is determined that one or the other involves the wrong tar-path, then the list of “tar-path elements to avoid” in the archive_path_mapper_stage2.sh script must be updated to eliminate the incorrect path, and we rerun that script to obtain a new output file. On some occasions this may be needed more than once, in order to determine the intended tar-path to the finalized run. On the other hand, if the tar-paths are equal, but it seems that variant filenames are found, there are both benign and difficult cases. A benign case exists where the files may have differing generation-dates, but are intended to be part of a a single finalized run, as in
HEADF:atm/hist/20181217.BCRD_CNPCTC20TR_OIBGC.ne30_oECv3.edison.cam.h0.2007-01.nc
HEADL:atm/hist/20190530.BCRD_CNPCTC20TR_OIBGC.ne30_oECv3.edison.cam.h0.2014-12.nc
If it is determined that the sequence of files is properly ordered and contiguous, one can elide the variant generation dates with the the first '*' in the pattern
1:BGC-v1:1_1:hist-BCRD:ens1:atm_nat_mon:atm/hist/*BCRD*.cam.h0.*.nc
Alternately, one can explicitly call for both sets by editing the file to contain two categorical lines for the same dataset “atm_nat_mon”:
1:BGC-v1:1_1:hist-BCRD:ens1:atm_nat_mon:atm/hist/20181217.BCRD*.cam.h0.*.nc
1:BGC-v1:1_1:hist-BCRD:ens1:atm_nat_mon:atm/hist/20190530.BCRD*.cam.h0.*.nc
A more difficult case occurs when the first and last names are not intended to be part of the same run, as in
HEADF:ice/hist/practice-mpascice.hist.am.timeSeriesStatsMonthly.2007-01-01.nc
HEADL:ice/hist/theta-mpascice.hist.am.timeSeriesStatsMonthly.2014-12-01.nc
then you must look at the full filename listing from “PathsFound” in stage 1 to determine whether the “practice” set or the “theta” set gives the full and proper collection of output files. Suppose that this is determined to be “theta”. You then have a choice (with different implications for robustness). Either ensure that the chosen pattern contains “theta”, as in “ice/hist/theta-*.nc”, or add “ice/hist/practice” to the list of “tar-path elements to avoid” in the archive_path_mapper_stage2.sh script.
Once you have completed this operation for the entire file (appended to each categorical line the working file-match pattern), issue the command
grep -v HEAD headset_list_first_last > archive_dataset_map_prelim
STEP 3: Then run
~/bartoletti1/outbox/archive_path_mapper_stage3.sh > update_Archive_Map
You can test the correctness of the “update_Archive_Map”