Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is a draft!]This document is to provide guidelines for managing our The purpose of this page is to document E3SM project’s policy for managing the E3SM simulations that need to be archived, including official simulations and important internal working simulations. The following steps are required to be done immediately after simulation completion to avoid data loss/corruption and time consuming re-runs.

Note E3SM’s policy change: All 3 steps are now required steps.

1. Short term archiving

(Optional, but recommended). Please note that this is a required step per latest E3SM policy.

Use CIME short-term archiving utility to reorganize model output to avoid having tens of thousands of output files in the single run/ sub-directory. Typical usage:

...

2. Archive to NERSC HPSS using zstash

Note that using zstash is required when archiving E3SM data. For the systems that do not provide HPSS, use zstash with “--hpss=none” to create a tar files to be then copied to LLNL for permanent storage.

The original model output should be archived on NERSC HPSS using zstash:

  • If a simulation was ran run on systems with HPSS: run zstash to archive the data to the local HPSS tape. If the HPSS is not on NERSC, use Globus to transfer the zstash *.tar and index.db files to NERSC HPSS:

    • If data were archived on Theta HPSS, to transfer to NERSC HPSS, use the Globus endpoint alcf#dtn_hpss and NERSC HPSS. (Make sure to opt for “verify file integrity after transfer“ as a transfer and Sync option, which checks if source and destination file checksums match)

  • If a simulation was ran run on a system without HPSS (i.e., anvil and compy): run zstash to generate index.db and tar files locally [Note: this zstash feature is under development will be available soon, ~within this month?available after version 0.4.1] and transfer these files to NERSC HPSS using Globus].

  • Make sure permissions are correct for files on HPSS (Please make the directory group/project readable, once publication is authorized that will be changed to world-readable).

  • If you keep seeing issues with a specific transfer between endpoints, you can email support@globus.org with the transfer ID. They can look at the transfer logs and facilitate the conversation between the endpoint admins.

Notes on using zstash:

To archive output from an E3SM simulation located under $CSCRATCH/E3SM_simulations/20170731.F20TR.ne30_ne30.edison:

...

The zstash documentation and best practice practices for E3SM can be found here.at: https://e3sm-project.github.io/zstash/_build/html/master/index.html

3. Document HPSS

...

locations

Document Documenting the HPSS locations on confluence so that members a central confluence page is a required step and it is helpful for everyone in the project who might have a need to locate the files. Members from infrastructure group can be notified and move the data will be closely monitoring these pages. Once a new simulation is entered, the data will be copied to a centralized space at LLNL (E3SM Archive - Data Source and Transfer Status ) for further post-processing (i.e. ESGF publication). Each simulation A default set of simulation data will be published to ESGF for official simulations. Please drop an email to Jill Chengzhu Zhang (zhang40@llnl.gov) and e3sm-data-support@llnl.gov for special publication requirements (i.e., specify years, cmip publication, etc.).

For v2: a project wide page is created /wiki/spaces/ED/pages/2766340117, and all the production runs are required to have an additional copy available through the NERSC Science Gateway:

Path on NERSC HPSS: /home/projects/e3sm/www

URL on the Web: https://portal.nersc.gov/archive/home/projects/e3sm/www/

For v1: Simulation and NGD group has its own confluence pages documenting the simulation output archive locationlocations::

Water Cycle

CBGC

Cryosphere[coming soon]

Note: For non-official simulations (one-off tuning runs, sensitivity study for research papers, etc.), please have a separate table with a note column, indicating how long the archive is desired to be kept or “don’t delete until data being published to ESGF“. These files may be given lower priority to be kept once our NERSC HPSS quota is reached. : DECK v1, High-res Coupled v1

BGC: /wiki/spaces/EBGC/pages/1056145509

Cryosphere: /wiki/spaces/ECG/pages/1736933506