The purpose of this page is to document E3SM project’s policy for managing the E3SM simulations that need to be archived, including official simulations and important internal working simulations. The following steps are required to be done immediately after simulation completion to avoid data loss/corruption and time consuming re-runs.

Note E3SM’s policy change: All 3 steps are now required steps.

1. Short term archiving

Please note that this is a required step per latest E3SM policy.

Use CIME short-term archiving utility to reorganize model output to avoid having tens of thousands of output files in the single run/ sub-directory. Typical usage:

cd case_scripts
./case.st_archive --force-move

The ‘force-move’ option will greatly speed-up short term archiving by moving rather than copying files. You can also invoke short term archiving periodically while the simulation is progressing. To do so safely, use additional options:

cd case_scripts
./case.st_archive --force-move --last-date 2005-01-01 --no-incomplete-logs

2. Archive to NERSC HPSS using zstash

Note that using zstash is required when archiving E3SM data. For the systems that do not provide HPSS, use zstash with “--hpss=none” to create a tar files to be then copied to LLNL for permanent storage.

The original model output should be archived on NERSC HPSS using zstash:

Notes on using zstash:

To archive output from an E3SM simulation located under $CSCRATCH/E3SM_simulations/20170731.F20TR.ne30_ne30.edison:

$ cd $CSCRATCH/E3SM_simulations/20170731.F20TR.ne30_ne30.edison
$ zstash create --hpss=test/E3SM_simulations/20170731.F20TR.ne30_ne30.edison .

The above command should generate optimized tar file(s) and the corresponding index database (index.db), saving checksums and additional metadata of the tar file(s).

It is highly recommended that you verify the integrity of the tar files after archiving with zstash on both the local machine and NERSC. The safest way to do so is to go to a new, empty directory and run:

$ zstash check --hpss=test/E3SM_simulations/20170731.F20TR.ne30_ne30.edison

If you encounter an error, save your original data. You may need to re-upload it via zstash create

The zstash documentation and best practices for E3SM can be found at: https://e3sm-project.github.io/zstash/_build/html/master/index.html

3. Document HPSS locations

Documenting the HPSS locations on a central confluence page is a required step and it is helpful for everyone in the project who might have a need to locate the files. Members from infrastructure group will be closely monitoring these pages. Once a new simulation is entered, the data will be copied to a centralized space at LLNL (E3SM Archive - Data Source and Transfer Status ) for further post-processing (i.e. ESGF publication). A default set of simulation data will be published to ESGF for official simulations. Please drop an email to Jill Chengzhu Zhang (zhang40@llnl.gov) and e3sm-data-support@llnl.gov for special publication requirements (i.e., specify years, cmip publication, etc.).

For v2: a project wide page is created /wiki/spaces/ED/pages/2766340117, and all the production runs are required to have an additional copy available through the NERSC Science Gateway:

Path on NERSC HPSS: /home/projects/e3sm/www

URL on the Web: https://portal.nersc.gov/archive/home/projects/e3sm/www/

For v1: Simulation and NGD group has its own confluence pages documenting the simulation output archive locations::

Water Cycle: DECK v1, High-res Coupled v1

BGC: /wiki/spaces/EBGC/pages/1056145509

Cryosphere: /wiki/spaces/ECG/pages/1736933506