...
<simulations_dir>: /lcrc/group/e3sm/<username>/E3SM_simulationsE3SMv2
So, it may be useful to set the following aliases:
Code Block |
---|
# Model running alias run_scripts="cd ${HOME}/E3SM/scripts/" alias simulations="cd /lcrc/group/e3sm/<username>/E3SM_simulationsE3SMv2/" |
On other machines, the paths are the same, except for the <simulations_dir>.
...
<simulations_dir>: /compyfs/<username>/E3SM_simulationsE3SMv2
On cori (NERSC):
<simulations_dir>: ${CSCRATCH}/E3SM_simulationsE3SMv2
Configuring the Model Run – Run Script
...
Simulations that are deemed sufficiently valuable should be archived using zstash
for long-term preservation.
...
Compy, Anvil and Chrysalis do not have local HPSS. We rely on NERSC HPSS for long-term archiving. Archiving requires a few separate steps:
Run
zstash create
to archive to local disk.Using Globus, transfer
zstash
archive files (everything under thezstash/
subdirectory) to NERSC HPSS. Select the option to preserve original files modification date.Run
zstash check
to verify integrity ofzstash
archive (and their transfer).Update simulation Confluence page with path to HPSS.
Helper scripts
Below are some helper scripts to facilitate steps (1) and (3) above.
batch_zstash_create.bash
to batch archive a number of simulations on compy. Run inside a ‘screen’ session to avoid any interruption:
Code Block |
---|
#!/bin/bash
# Run on compy
# Load E3SM Unified
source /share/apps/E3SM/conda_envs/load_latest_e3sm_unified.sh
# List of experiments to archive with zstash
EXPS=(\
20200827.alpha4_v1GM.piControl.ne30pg2_r05_EC30to60E2r2-1900_ICG.compy \
20200905.alpha4_dtOcn.piControl.ne30pg2_r05_EC30to60E2r2-1900_ICG.compy \
)
# Loop over simulations
for EXP in "${EXPS[@]}"
do
echo === Archiving ${EXP} ===
cd /compyfs/gola749/E3SM_simulations/${EXP}
mkdir -p zstash
stamp=`date +%Y%m%d`
time zstash create -v --hpss=none --maxsize 128 . 2>&1 | tee zstash/zstash_create_${stamp}.log
done |
batch_zstash_check.bash
to batch check a number of simulations on NERSC dtn. Run inside a ‘screen’ session to avoid any interruption:
...
If you are archiving a simulation run on Compy, LCRC (Chrysalis/Anvil), do all of the following steps. If you are archiving a simulation run on NERSC (Cori), skip to step 4.
1. Clean up directory
Log into the machine that you ran the simulation on.
Remove all eam.i
files except the latest one. Dates are of the form <YYYY-MM-DD>
.
Code Block |
---|
$ cd <simulations_dir>/<case_name>/run
$ ls | wc -l # See how many items are in this directory
$ mv <case_name>.eam.i.<YYYY-MM-DD>-00000.nc tmp.nc
$ rm <case_name>.eam.i.*.nc
$ mv tmp.nc <case_name>.eam.i.2015-01-01-00000.nc
$ ls | wc -l # See how many items are in this directory |
There may still be more files than is necessary to archive. You can probably remove *.err
, *.lock
, *debug_block*
, *ocean_block_stats*
files.
2. zstash create
On the machine that you ran the simulation on:
If you don’t have one already, create a directory for utilities, e.g., utils
. Then open a file in that directory called batch_zstash_create.bash
and paste the following in it, making relevant edits:
Code Block |
---|
#!/bin/bash
# Run on <machine name>
# Load E3SM Unified
<Command to load the E3SM Unified environment>
# List of experiments to archive with zstash
EXPS=(\
<case_name> \
)
# Loop over simulations
for EXP in "${EXPS[@]}"
do
echo === Archiving ${EXP} ===
cd <simulations_dir>/${EXP}
mkdir -p zstash
stamp=`date +%Y%m%d`
time zstash create -v --hpss=none --maxsize 128 . 2>&1 | tee zstash/zstash_create_${stamp}.log
done |
Commands to load the E3SM Unified environment for each machine can be found at https://e3sm-project.github.io/zppy/_build/html/main/getting_started.html .
Then, do the following:
Code Block |
---|
$ screen # Enter screen
$ screen -ls # Output should say "Attached"
$ ./batch_zstash_create.bash 2>&1 | tee batch_zstash_create.log
# Control A D to exit screen
# DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!!
$ screen -ls # Output should say "Detached"
$ hostname
# If you log in on another login node,
# then you will need to ssh to this one to get back to the screen session.
$ tail -f batch_zstash_create.log # Check log without going into screen
# Wait for this to finish
# (On Chrysalis, for 165 years of data, this takes ~14 hours)
$ screen -r # Return to screen
# Check that output ends with `real`, `user`, `sys` time information
$ exit # Terminate screen
$ screen -ls # The screen should no longer be listed
$ ls <simulations_dir>/<case_name>/zstash
# tar files, `index.db`, and a `zstash_create` log should be present
# If you'd like to know how much space the archive or entire simulation use, run:
$ du -sh <simulations_dir>/<case_name>/zstash
$ du -sh <simulations_dir>/<case_name> |
3. Transfer to NERSC
On a NERSC machine (Cori):
Code Block |
---|
$ mkdir -p /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>
$ ls /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>
# Should be empty |
Log into Globus, using your NERSC credentials: https://www.globus.org/
(Left hand side): Transfer from
<the machine's DTN> <simulations_dir>/<case_name>/zstash
Click enter on path and "select all" on left-hand side
Transfer to
NERSC DTN /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>
Notice we're using
cfs
rather thanscratch
on Cori
Click enter on the path
Click “Transfer & Sync Options” in the center.
Choose:
“sync - only transfer new or changed files” (choose “modification time is newer” in the dropdown box)
“preserve source file modification times”
“verify file integrity after transfer”
For “Label This Transfer”: “zstash <case_name> <machine name> to NERSC”
Click "Start" on the left hand side.
You should get an email from Globus when the transfer is completed. (On Chrysalis, for 165 years of data, this transfer takes ~13 hours).
4. Transfer to HPSS
On a NERSC machine (Cori):
Code Block |
---|
Log in to Cori
$ cd /global/cfs/cdirs/e3sm/forsyth/<username>/<case_name>
$ ls | wc -l
# Should match the number of files in the other machine's `<simulations_dir>/<case_name>/zstash`
$ ls *.tar | wc -l
# Should be two less than the previous result,
# since `index.db` and the `zstash_create` log are also present.
$ hsi
$ pwd
# Should be /home/<first letter>/<username>
$ ls E3SMv2
# Check what you already have in the directory
# You don't want to accidentally overwrite something a directory already in HPSS.
$ exit
$ cd /global/cfs/cdirs/e3sm/<username>/E3SMv2
$ screen
$ screen -ls # Output should say "Attached"
# https://www2.cisl.ucar.edu/resources/storage-and-file-systems/hpss/managing-files-hsi
# cput will not transfer file if it exists.
$ hsi "cd /home/<first letter>/<username>/E3SMv2/; cput -R <case_name>"
# Control A D to exit screen
# DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!!
$ screen -ls # Output should say "Detached"
$ hostname
# If you log in on another login node,
# then you will need to ssh to this one to get back to the screen session.
# Wait for the `hsi` command to finish
# (On Chrysalis, for 165 years of data, this takes ~2 hours)
$ screen -r # Return to screen
# Check output for any errors
$ exit # Terminate screen
$ screen -ls # The screen should no longer be listed
$ hsi
$ ls /home/<first letter>/<username>/E3SMv2/<case_name>
# Should match the number of files in the other machine's `<simulations_dir>/<case_name>/zstash`
$ exit |
5. zstash check
On a NERSC machine (Cori):
Code Block |
---|
$ cd /global/homes/<first letter>/<username>
$ emacs batch_zstash_check.bash |
Paste the following in that file, making relevant edits:
Code Block |
---|
#!/bin/bash
# Run on NERSC dtn
# Load environment that includes zstash
<Command to load the E3SM Unified environment>
# List of experiments to archive with zstash
EXPS=(\
<case_name> \
)
# Loop over simulations
for EXP in "${EXPS[@]}"
do
echo === Checking ${EXP} ===
cd /global/cfs/cdirs/e3sm/<username>/E3SMv2
mkdir -p ${EXP}/zstash
cd ${EXP}
stamp=`date +%Y%m%d`
time zstash check -v --hpss=/home/<first letter>/<username>/E3SMv2/${EXP} --workers 2 2>&1 | tee zstash/zstash_check_${stamp}.log
done |
Commands to load the E3SM Unified environment for each machine can be found at https://e3sm-project.github.io/zppy/_build/html/main/getting_started.html .
Then, do the following:
Code Block |
---|
$ ssh dtn01.nersc.gov
$ screen
$ screen -ls # Output should say "Attached"
$ cd /global/homes/<first letter>/<username>
$ ./batch_zstash_check.bash
# Control A D to exit screen
# DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!!
$ screen -ls # Output should say "Detached"
$ hostname
# If you log in on another login node,
# then you will need to ssh to this one to get back to the screen session.
# Wait for the script to finish
# (On Chrysalis, for 165 years of data, this takes ~5 hours)
$ screen -r # Return to screen
# Check that output ends with `INFO: No failures detected when checking the files.`
# as well as listing real`, `user`, `sys` time information
$ exit # Terminate screen
$ screen -ls # The screen should no longer be listed
$ exit # exit data transfer node
$ cd /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>/zstash
$ tail zstash_check_20211004.log
# Output should match the output from the screen (without the time information) |
6. Document
On a NERSC machine (Cori):
Code Block |
---|
$ hsi
$ ls /home/<first letter>/<username>/E3SMv2
# Check that the simulation case is now listed in this directory
$ ls /home/<first letter>/<username>/E3SMv2/<case_name>
# Should match the number of files in the other machine's `<simulations_dir>/<case_name>/zstash`
$ exit
$ cd /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>/zstash
$ ls
# `index.db` and `zstash_check` log should be the only items listed
# https://www2.cisl.ucar.edu/resources/storage-and-file-systems/hpss/managing-files-hsi
# cput will not transfer file if it exists.
$ hsi "cd /home/<first letter>/<username>/E3SMv2/<case_name>; cput -R <zstash_check log>"
$ hsi
$ ls /home/<first letter>/<username>/E3SMv2/<case_name>
# tar files, `index.db`, `zstash_create` log, and `zstash_check` log should be present
$ exit |
Update simulation Confluence page with path to HPSS (For Water Cycle’s v2 work, that’s V2 Simulation Planning ) . Specify:
/home/<first letter>/<username>/E3SMv2/<case_name>
zstash_create_<stamp>.log
zstash_check_<stamp>.log
7. Delete files
On a NERSC machine (Cori):
Code Block |
---|
$ hsi $ ls /home/<first letter>/<username>/E3SMv2/<case_name> # tar files, `index.db`, `zstash_create` log, and `zstash_check` log should be present # So, we can safely delete these items on cfs $ exit $ ls /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name> # Should match output from the `ls` above $ cd /global/cfs/cdirs/e3sm/golaz/E3SM_simulations mkdir -p ${EXP}/zstash cd ${EXP} stamp=`date +%Y%m%d` time zstash check -v --hpss=/home/g/golaz/2020/${EXP} --workers 2 2>&1 | tee zstash/zstash_check_${stamp}.log done<username> $ ls E3SMv2 # Only the <case_name> you just transferred to HPSS should be listed $ rm -rf E3SMv2 |
On the machine that you ran the simulation on:
Code Block |
---|
$ cd <simulations_dir>/<case_name>
$ ls zstash
# tar files, index.db, `zstash_create` log should be present
$ rm -rf zstash # Remove the zstash directory, keeping original files
$ cd <simulations_dir> |
More info
Refer to zstash's best practices for E3SM for details.
...