...
If you run ls
you’ll probably see a file files like e3sm_diags_180x360_aave_model_vs_obs_0001-0020.status
. This is one e3sm_diags
job. Parts of the file name are explained below:
...
There may still be more files than is necessary to archive. You can probably remove *.err
, *.lock
, *debug_block*
, *ocean_block_stats*
files.
2. zstash create
& Transfer to NERSC HPSS
2.a. E3SM Unified v1.6.0 / zstash v1.2.0 or greater
If you are using E3SM Unified v1.6.0
or greater, https://github.com/E3SM-Project/zstash/pull/154 has enabled Globus (https://www.globus.org/ ) transfer with zstash create
.
On the machine that you ran the simulation on:
...
Code Block |
---|
#!/bin/bash # Run on <machine name> # Load E3SM Unified <Command to load the E3SM Unified environment> # List of experiments to archive with zstash EXPS=(\ <case_name> \ ) # Loop over simulations for EXP in "${EXPS[@]}" do echo === Archiving ${EXP} === cd <simulations_dir>/${EXP} mkdir -p zstash stamp=`date +%Y%m%d` time zstash create -v --hpss=none globus://nersc/home/<first letter>/<username>/E3SMv2/${EXP} --maxsize 128 . 2>&1 | tee zstash/zstash_create_${stamp}.log done |
...
Code Block |
---|
$ screen # Enter screen $ screen -ls # Output should say "Attached" $ ./batch_zstash_create.bash 2>&1 | tee batch_zstash_create.log # Control A D to exit screen # DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!! $ screen -ls # Output should say "Detached" $ hostname # If you log in on another login node, # then you will need to ssh to this one to get back to the screen session. $ tail -f batch_zstash_create.log # Check log without going into screen # Wait for this to finish # (On Chrysalis, for 165 years of data, this takes ~14 hours) $ $ screen -r # Return to screen # Check that output ends with `real`, `user`, `sys` time information $ exit # Terminate screen $ screen -ls # The screen should no longer be listed $ ls <simulations_dir>/<case_name>/zstash # tar files, `index.db`, and a `zstash_create` log should be present # No tar files should be listed # If you'd like to know how much space the archive or entire simulation use, run: $ du -sh <simulations_dir>/<case_name>/zstash $ du -sh <simulations_dir>/<case_name> |
3. Transfer to NERSC
On a NERSC machine (Cori)Then, on NERSC/Cori:
Code Block |
---|
$ mkdir -p /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>hsi $ ls /global/cfs/cdirs/e3smhome/<first letter>/<username>/E3SMv2/<case_name> # Tar Shouldfiles be empty |
Log into Globus, using your NERSC credentials: https://www.globus.org/
(Left hand side): Transfer from
<the machine's DTN> <simulations_dir>/<case_name>/zstash
Click enter on path and "select all" on left-hand side
Transfer to
NERSC DTN /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>
Notice we're using
cfs
rather thanscratch
on Cori
Click enter on the path
Click “Transfer & Sync Options” in the center.
Choose:
“sync - only transfer new or changed files” (choose “modification time is newer” in the dropdown box)
“preserve source file modification times”
“verify file integrity after transfer”
For “Label This Transfer”: “zstash <case_name> <machine name> to NERSC”
Click "Start" on the left hand side.
You should get an email from Globus when the transfer is completed. (On Chrysalis, for 165 years of data, this transfer takes ~13 hours).
4. Transfer to HPSS
On a NERSC machine (Cori):
Code Block |
---|
Log in to Coriand `index.db` should be listed. # Note `| wc -l` doesn't work on hsi $ exit |
2.b. Earlier releases
2.b.i. zstash create
On the machine that you ran the simulation on:
If you don’t have one already, create a directory for utilities, e.g., utils
. Then open a file in that directory called batch_zstash_create.bash
and paste the following in it, making relevant edits:
Code Block |
---|
#!/bin/bash
# Run on <machine name>
# Load E3SM Unified
<Command to load the E3SM Unified environment>
# List of experiments to archive with zstash
EXPS=(\
<case_name> \
)
# Loop over simulations
for EXP in "${EXPS[@]}"
do
echo === Archiving ${EXP} ===
cd <simulations_dir>/${EXP}
mkdir -p zstash
stamp=`date +%Y%m%d`
time zstash create -v --hpss=none --maxsize 128 . 2>&1 | tee zstash/zstash_create_${stamp}.log
done |
Commands to load the E3SM Unified environment for each machine can be found at https://e3sm-project.github.io/zppy/_build/html/main/getting_started.html .
Then, do the following:
Code Block |
---|
$ screen # Enter screen
$ screen -ls # Output should say "Attached"
$ ./batch_zstash_create.bash 2>&1 | tee batch_zstash_create.log
# Control A D to exit screen
# DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!!
$ screen -ls # Output should say "Detached"
$ hostname
# If you log in on another login node,
# then you will need to ssh to this one to get back to the screen session.
$ tail -f batch_zstash_create.log # Check log without going into screen
# Wait for this to finish
# (On Chrysalis, for 165 years of data, this takes ~14 hours)
$ screen -r # Return to screen
# Check that output ends with `real`, `user`, `sys` time information
$ exit # Terminate screen
$ screen -ls # The screen should no longer be listed
$ ls <simulations_dir>/<case_name>/zstash
# tar files, `index.db`, and a `zstash_create` log should be present
# If you'd like to know how much space the archive or entire simulation use, run:
$ du -sh <simulations_dir>/<case_name>/zstash
$ du -sh <simulations_dir>/<case_name> |
2.b.ii. Transfer to NERSC
On a NERSC machine (Cori):
Code Block |
---|
$ mkdir -p /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>
$ ls /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>
# Should be empty |
Log into Globus, using your NERSC credentials: https://www.globus.org/
(Left hand side): Transfer from
<the machine's DTN> <simulations_dir>/<case_name>/zstash
Click enter on path and "select all" on left-hand side
Transfer to
NERSC DTN /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>
Notice we're using
cfs
rather thanscratch
on Cori
Click enter on the path
Click “Transfer & Sync Options” in the center.
Choose:
“sync - only transfer new or changed files” (choose “modification time is newer” in the dropdown box)
“preserve source file modification times”
“verify file integrity after transfer”
For “Label This Transfer”: “zstash <case_name> <machine name> to NERSC”
Click "Start" on the left hand side.
You should get an email from Globus when the transfer is completed. (On Chrysalis, for 165 years of data, this transfer takes ~13 hours).
2.b.iii. Transfer to HPSS
On a NERSC machine (Cori):
Code Block |
---|
Log in to Cori $ cd /global/cfs/cdirs/e3sm/forsyth/<username>/<case_name> $ ls | wc -l # Should match the number of files in the other machine's `<simulations_dir>/<case_name>/zstash` $ ls *.tar | wc -l # Should be two less than the previous result, # since `index.db` and the `zstash_create` log are also present. $ hsi $ pwd # Should be /home/<first letter>/<username> $ ls E3SMv2 # Check what you already have in the directory # You don't want to accidentally overwrite a directory already in HPSS. $ exit $ cd /global/cfs/cdirs/e3sm/forsyth/<username>/<case_name>E3SMv2 $ lsscreen |$ wcscreen -lls # ShouldOutput matchshould thesay number"Attached" of files in the other machine's `<simulations_dir>/<case_name>/zstash` $ ls *.tar | wc -l # Should be two less than the previous result, # since `index.db` and the `zstash_create` log are also present. $ hsi $ pwd # Should be# https://www2.cisl.ucar.edu/resources/storage-and-file-systems/hpss/managing-files-hsi # cput will not transfer file if it exists. $ hsi "cd /home/<first letter>/<username>/E3SMv2/; $cput ls E3SMv2-R <case_name>" # CheckControl whatA youD alreadyto haveexit inscreen the directory # YouDO don'tNOT wantCONTROL toX accidentally/ overwriteCONTROL aC directory(as already in HPSSfor emacs). $This exit will $terminate cd /global/cfs/cdirs/e3sm/<username>/E3SMv2 $ screenthe task running in screen!!! $ screen -ls # Output should say "AttachedDetached" $ hostname # https://www2.cisl.ucar.edu/resources/storage-and-file-systems/hpss/managing-files-hsi # cput will not transfer file if it exists. $ hsi "cd /home/<first letter>/<username>/E3SMv2/; cput -R <case_name>" # Control A D to exit screen # DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!! $ screen -ls # Output should say "Detached" $ hostname # If you log in on another login node, # then you will need to ssh to this one to get back to the screen session. # Wait for the `hsi` command to finish # (On Chrysalis, for 165 years of data, this takes ~2 hours) $ screen -r # Return to screen # Check output for any errors $ exit # Terminate screen $ screen -ls # The screen should no longer be listed $ hsi $ ls /home/<first letter>/<username>/E3SMv2/<case_name> # Should match the number of files in the other machine's `<simulations_dir>/<case_name>/zstash` $ exit |
5. zstash check
On a NERSC machine (Cori):
Code Block |
---|
$ cd /global/homes/<first letter>/<username>
$ emacs batch_zstash_check.bash |
Paste the following in that file, making relevant edits:
Code Block |
---|
#!/bin/bash
# Run on NERSC dtn
# Load environment that includes zstash
<Command to load the E3SM Unified environment>
# List of experiments to archive with zstash
EXPS=(\
<case_name> \
)
# Loop over simulations
for EXP in "${EXPS[@]}"
do
echo === Checking ${EXP} ===
cd /global/cfs/cdirs/e3sm/<username>/E3SMv2
mkdir -p ${EXP}/zstash
cd ${EXP}
stamp=`date +%Y%m%d`
time zstash check -v --hpss=/home/<first letter>/<username>/E3SMv2/${EXP} --workers 2 2>&1 | tee zstash/zstash_check_${stamp}.log
done |
...
If you log in on another login node,
# then you will need to ssh to this one to get back to the screen session.
# Wait for the `hsi` command to finish
# (On Chrysalis, for 165 years of data, this takes ~2 hours)
$ screen -r # Return to screen
# Check output for any errors
$ exit # Terminate screen
$ screen -ls # The screen should no longer be listed
$ hsi
$ ls /home/<first letter>/<username>/E3SMv2/<case_name>
# Should match the number of files in the other machine's `<simulations_dir>/<case_name>/zstash`
$ exit |
3. zstash check
On a NERSC machine (Cori):
Code Block |
---|
$ cd /global/homes/<first letter>/<username>
$ emacs batch_zstash_check.bash |
Paste the following in that file, making relevant edits:
Code Block |
---|
#!/bin/bash
# Run on NERSC dtn
# Load environment that includes zstash
<Command to load the E3SM Unified environment>
# List of experiments to archive with zstash
EXPS=(\
<case_name> \
)
# Loop over simulations
for EXP in "${EXPS[@]}"
do
echo === Checking ${EXP} ===
cd /global/cfs/cdirs/e3sm/<username>/E3SMv2
mkdir -p ${EXP}/zstash
cd ${EXP}
stamp=`date +%Y%m%d`
time zstash check -v --hpss=/home/<first letter>/<username>/E3SMv2/${EXP} --workers 2 2>&1 | tee zstash/zstash_check_${stamp}.log
done |
Commands to load the E3SM Unified environment for each machine can be found at https://e3sm-project.github.io/zppy/_build/html/main/getting_started.html .
If you’re using E3SM Unified v1.6.0 / zstash v1.2.0 (or greater) and you want to check a long simulation, you can use the --tars
option introduced in https://github.com/E3SM-Project/zstash/pull/170 to split the checking into more manageable pieces:
# Starting at 00005a until the end zstash check --tars=00005a- # Starting from the beginning to 00005a (included) zstash check --tars=-00005a # Specific range zstash check --tars=00005a-00005c # Selected tar files zstash check --tars=00003e,00004e,000059 # Mix and match zstash check --tars=000030-00003e,00004e,00005a-
Then, do the following:
Code Block |
---|
$ ssh dtn01.nersc.gov $ screen $ screen -ls # Output should say "Attached" $ cd /global/homes/<first letter>/<username> $ ./batch_zstash_check.bash # Control A D to exit screen # DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!! $ screen -ls # Output should say "Detached" $ hostname # If you log in on another login node, # then you will need to ssh to this one to get back to the screen session. # Wait for the script to finish # (On Chrysalis, for 165 years of data, this takes ~5 hours) $ screen -r # Return to screen # Check that output ends with `INFO: No failures detected when checking the files.` # as well as listing real`, `user`, `sys` time information $ exit # Terminate screen $ screen -ls # The screen should no longer be listed $ exit # exit data transfer node $ cd /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>/zstash $ tail zstash_check_<stamp>.log # Output should match the output from the screen (without the time information) |
Note |
---|
Because of https://github.com/E3SM-Project/zstash/issues/167 , for now it is a good idea to run |
...
4. Document
On a NERSC machine (Cori):
...
Update simulation Confluence page with path to HPSS information regarding this simulation (For Water Cycle’s v2 work, that’s that page is V2 Simulation Planning ) . SpecifyIn the zstash archive
column, specify:
/home/<first letter>/<username>/E3SMv2/<case_name>
zstash_create_<stamp>.log
zstash_check_<stamp>.log
...
5. Delete files
On a NERSC machine (Cori):
...