Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Campaign

Presently one of { BCG-v1, CRYO-v1, DECK-v1, HR-v1 }, although new campaigns are expected to appear

Model

Presently one of { 1_0, 1_1, 1_1_ECA, 1_2 }, although new models are expected to appear

Experiment(s)

Consulting tea leaves, burnt entrails, or the principals involved in conducting the simulations is recommended here. Do not expect the experiment name to support human comprehension in favor of compactness.

Ensemble

Usually “ens1”, although “ens2” through “ens5” have appeared in the night sky. These must should be embedded in the experiment archive name, when ens[n] > ens1

Archive Path(s)

This should be the NERSC HPSS path(s) to the zstash-formatted archive(s) associated with the four elements above. Take note of the archive_path_leaf_directory_name(s).

With the above 5 elements in hand, you can craft the receiving archive directory or directories, “/p/user_pub/e3sm/archive/model/campaign/archive_path_leaf_directory_name(s).

...

Here, it seems clear that we want to add “atm/hist” to the file search pattern. But in many cases there are multiple plausible internal tar-paths to matching files. In order to help automate the necessary disambiguity disambiguation

STEP 2: We run

~/bartoletti1/outbox/archive_path_mapper_stage2.sh

...

For the filenames that remain, output is produced (named “headset_list_first_last”) that lists the first and last filename found in the residual filelistsfile-lists. For the set of seven BGC-v1,ens1,hist_BCRD files listed above, that output was

...

This is what we want to see. The remaining “first and last” file in each set has a consistent and reasonable tar path, and the filenames match expect except for the “sim-date” field. For instance, consider the last 3 lines of the output file above. Edit the categorical line

1:BGC-v1:1_1:hist-BCRD:ens1:sea-ice_nat_mon

by appending everything after “HEADF” of the “first file found” line (including the separating colon), but wild-carding the sim-date field “2007-01-01” to obtain the line

1:BGC-v1:1_1:hist-BCRD:ens1:sea-ice_nat_mon:ice/hist/mpascice.hist.am.timeSeriesStatsMonthly.*.nc

(In truth, the pattern “ice/hist/*.nc” would suffice, as it seems only the correct files appear in the listing. including “mpascice.hist.am.timeSeriesStatsMonthly“ adds assurance.)

When things are not so smooth:

If instead we discovered that “first” and “last” did not match, we have two possible courses of action. If it is determined that one or the other involves the wrong tar-path, then the list of “tar-path elements to avoid” in the archive_path_mapper_stage2.sh script must be updated to eliminate the incorrect path, and we rerun that script to obtain a new output file. On some occasions this may be needed more than once, in order to determine the intended tar-path to the finalized run. On the other hand, if the tar-paths are equal, but it seems that variant filenames are found, there are both benign and difficult cases. A benign case exists where the files may have differing generation-dates, but are intended to be part of a a single finalized run, as in

HEADF:atm/hist/20181217.BCRD_CNPCTC20TR_OIBGC.ne30_oECv3.edison.cam.h0.2007-01.nc
HEADL:atm/hist/20190530.BCRD_CNPCTC20TR_OIBGC.ne30_oECv3.edison.cam.h0.2014-12.nc

If it is determined that the sequence of files is properly ordered and contiguous, one can elide the variant generation dates with the the first '*' in the pattern

1:BGC-v1:1_1:hist-BCRD:ens1:atm_nat_mon:atm/hist/*BCRD*.cam.h0.*.nc

Alternately, one can explicitly call for both sets by editing the file to contain two categorical lines for the same dataset “atm_nat_mon”:

1:BGC-v1:1_1:hist-BCRD:ens1:atm_nat_mon:atm/hist/20181217.BCRD*.cam.h0.*.nc
1:BGC-v1:1_1:hist-BCRD:ens1:atm_nat_mon:atm/hist/20190530.BCRD*.cam.h0.*.nc

A more difficult case occurs when the first and last names are not intended to be part of the same run, as in

HEADF:ice/hist/practice-mpascice.hist.am.timeSeriesStatsMonthly.2007-01-01.nc
HEADL:ice/hist/theta-mpascice.hist.am.timeSeriesStatsMonthly.2014-12-01.nc

then you must look at the full filename listing from “PathsFound” in stage 1 to determine whether the “practice” set or the “theta” set gives the full and proper collection of output files. Suppose that this is determined to be “theta”. You then have a choice (with different implications for robustness). Either ensure that the chosen pattern contains “theta”, as in “ice/hist/theta-*.nc”, or add “ice/hist/practice” to the list of “tar-path elements to avoid” in the archive_path_mapper_stage2.sh script.

Once you have completed this operation for the entire file (appended to each categorical line the working file-match pattern), issue the command

grep -v HEAD headset_list_first_last > archive_dataset_map_prelim

STEP 3: Then run

~/bartoletti1/outbox/archive_path_mapper_stage3.sh > update_Archive_Map

You can test the correctness of the “update_Archive_Map”