Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If you run ls you’ll probably see a file files like e3sm_diags_180x360_aave_model_vs_obs_0001-0020.status. This is one e3sm_diags job. Parts of the file name are explained below:

...

There may still be more files than is necessary to archive. You can probably remove *.err, *.lock, *debug_block*, *ocean_block_stats* files.

2. zstash create & Transfer to NERSC HPSS

2.a. E3SM Unified v1.6.0 / zstash v1.2.0 or greater

If you are using E3SM Unified v1.6.0 or greater, https://github.com/E3SM-Project/zstash/pull/154 has enabled Globus (https://www.globus.org/ ) transfer with zstash create.

On the machine that you ran the simulation on:

...

Code Block
#!/bin/bash

# Run on <machine name>

# Load E3SM Unified
<Command to load the E3SM Unified environment>

# List of experiments to archive with zstash
EXPS=(\
<case_name> \
)

# Loop over simulations
for EXP in "${EXPS[@]}"
do
    echo === Archiving ${EXP} ===
    cd <simulations_dir>/${EXP}
    mkdir -p zstash
    stamp=`date +%Y%m%d`
    time zstash create -v --hpss=none globus://nersc/home/<first letter>/<username>/E3SMv2/${EXP} --maxsize 128 . 2>&1 | tee zstash/zstash_create_${stamp}.log
done

...

Code Block
$ screen # Enter screen
$ screen -ls # Output should say "Attached"
$ ./batch_zstash_create.bash 2>&1 | tee batch_zstash_create.log
# Control A D to exit screen
# DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!!

$ screen -ls # Output should say "Detached"
$ hostname
# If you log in on another login node, 
# then you will need to ssh to this one to get back to the screen session.
$ tail -f batch_zstash_create.log # Check log without going into screen
# Wait for this to finish
# (On Chrysalis, for 165 years of data, this takes ~14 hours)

$
$ screen -r # Return to screen
# Check that output ends with `real`, `user`, `sys` time information
$ exit # Terminate screen

$ screen -ls # The screen should no longer be listed
$ ls <simulations_dir>/<case_name>/zstash
# tar files, `index.db`, and a `zstash_create` log should be present
# No tar files should be listed

# If you'd like to know how much space the archive or entire simulation use, run:
$ du -sh <simulations_dir>/<case_name>/zstash
$ du -sh <simulations_dir>/<case_name>

3. Transfer to NERSC

On a NERSC machine (Cori)Then, on NERSC/Cori:

Code Block
$ mkdir -p /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>hsi
$ ls /global/cfs/cdirs/e3smhome/<first letter>/<username>/E3SMv2/<case_name>
# Tar Shouldfiles be empty

Log into Globus, using your NERSC credentials: https://www.globus.org/

  1. (Left hand side): Transfer from <the machine's DTN> <simulations_dir>/<case_name>/zstash

  2. Click enter on path and "select all" on left-hand side

  3. Transfer to NERSC DTN /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>

    1. Notice we're using cfs rather than scratch on Cori

  4. Click enter on the path

  5. Click “Transfer & Sync Options” in the center.

  6. Choose:

    1. “sync - only transfer new or changed files” (choose “modification time is newer” in the dropdown box)

    2. “preserve source file modification times”

    3. “verify file integrity after transfer”

  7. For “Label This Transfer”: “zstash <case_name> <machine name> to NERSC”

  8. Click "Start" on the left hand side.

You should get an email from Globus when the transfer is completed. (On Chrysalis, for 165 years of data, this transfer takes ~13 hours).

4. Transfer to HPSS

On a NERSC machine (Cori):

Code Block
Log in to Coriand `index.db` should be listed.
# Note `| wc -l` doesn't work on hsi
$ exit

2.b. Earlier releases

2.b.i. zstash create

On the machine that you ran the simulation on:

If you don’t have one already, create a directory for utilities, e.g., utils. Then open a file in that directory called batch_zstash_create.bash and paste the following in it, making relevant edits:

Code Block
#!/bin/bash

# Run on <machine name>

# Load E3SM Unified
<Command to load the E3SM Unified environment>

# List of experiments to archive with zstash
EXPS=(\
<case_name> \
)

# Loop over simulations
for EXP in "${EXPS[@]}"
do
    echo === Archiving ${EXP} ===
    cd <simulations_dir>/${EXP}
    mkdir -p zstash
    stamp=`date +%Y%m%d`
    time zstash create -v --hpss=none  --maxsize 128 . 2>&1 | tee zstash/zstash_create_${stamp}.log
done

Commands to load the E3SM Unified environment for each machine can be found at https://e3sm-project.github.io/zppy/_build/html/main/getting_started.html .

Then, do the following:

Code Block
$ screen # Enter screen
$ screen -ls # Output should say "Attached"
$ ./batch_zstash_create.bash 2>&1 | tee batch_zstash_create.log
# Control A D to exit screen
# DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!!

$ screen -ls # Output should say "Detached"
$ hostname
# If you log in on another login node, 
# then you will need to ssh to this one to get back to the screen session.
$ tail -f batch_zstash_create.log # Check log without going into screen
# Wait for this to finish
# (On Chrysalis, for 165 years of data, this takes ~14 hours)

$ screen -r # Return to screen
# Check that output ends with `real`, `user`, `sys` time information
$ exit # Terminate screen

$ screen -ls # The screen should no longer be listed
$ ls <simulations_dir>/<case_name>/zstash
# tar files, `index.db`, and a `zstash_create` log should be present

# If you'd like to know how much space the archive or entire simulation use, run:
$ du -sh <simulations_dir>/<case_name>/zstash
$ du -sh <simulations_dir>/<case_name>

2.b.ii. Transfer to NERSC

On a NERSC machine (Cori):

Code Block
$ mkdir -p /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>
$ ls /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>
# Should be empty

Log into Globus, using your NERSC credentials: https://www.globus.org/

  1. (Left hand side): Transfer from <the machine's DTN> <simulations_dir>/<case_name>/zstash

  2. Click enter on path and "select all" on left-hand side

  3. Transfer to NERSC DTN /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>

    1. Notice we're using cfs rather than scratch on Cori

  4. Click enter on the path

  5. Click “Transfer & Sync Options” in the center.

  6. Choose:

    1. “sync - only transfer new or changed files” (choose “modification time is newer” in the dropdown box)

    2. “preserve source file modification times”

    3. “verify file integrity after transfer”

  7. For “Label This Transfer”: “zstash <case_name> <machine name> to NERSC”

  8. Click "Start" on the left hand side.

You should get an email from Globus when the transfer is completed. (On Chrysalis, for 165 years of data, this transfer takes ~13 hours).

2.b.iii. Transfer to HPSS

On a NERSC machine (Cori):

Code Block
Log in to Cori
$ cd /global/cfs/cdirs/e3sm/forsyth/<username>/<case_name>
$ ls | wc -l
# Should match the number of files in the other machine's `<simulations_dir>/<case_name>/zstash`
$ ls *.tar | wc -l
# Should be two less than the previous result, 
# since `index.db` and the `zstash_create` log are also present.

$ hsi
$ pwd
# Should be /home/<first letter>/<username>
$ ls E3SMv2
# Check what you already have in the directory
# You don't want to accidentally overwrite a directory already in HPSS.
$ exit

$ cd /global/cfs/cdirs/e3sm/forsyth/<username>/<case_name>E3SMv2

$ lsscreen
|$ wcscreen -lls # ShouldOutput matchshould thesay number"Attached"
of files in the other machine's `<simulations_dir>/<case_name>/zstash`
$ ls *.tar | wc -l
# Should be two less than the previous result, 
# since `index.db` and the `zstash_create` log are also present.

$ hsi
$ pwd
# Should be# https://www2.cisl.ucar.edu/resources/storage-and-file-systems/hpss/managing-files-hsi
# cput will not transfer file if it exists.
$ hsi "cd /home/<first letter>/<username>/E3SMv2/; $cput ls E3SMv2-R <case_name>"
# CheckControl whatA youD alreadyto haveexit inscreen
the directory
# YouDO don'tNOT wantCONTROL toX accidentally/ overwriteCONTROL aC directory(as already in HPSSfor emacs). $This exit
will $terminate cd /global/cfs/cdirs/e3sm/<username>/E3SMv2

$ screenthe task running in screen!!!

$ screen -ls # Output should say "AttachedDetached"
$ hostname
# https://www2.cisl.ucar.edu/resources/storage-and-file-systems/hpss/managing-files-hsi
# cput will not transfer file if it exists.
$ hsi "cd /home/<first letter>/<username>/E3SMv2/; cput -R <case_name>"
# Control A D to exit screen
# DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!!

$ screen -ls # Output should say "Detached"
$ hostname
# If you log in on another login node, 
# then you will need to ssh to this one to get back to the screen session.
# Wait for the `hsi` command to finish
# (On Chrysalis, for 165 years of data, this takes ~2 hours)

$ screen -r # Return to screen
# Check output for any errors
$ exit # Terminate screen

$ screen -ls # The screen should no longer be listed
$ hsi
$ ls /home/<first letter>/<username>/E3SMv2/<case_name>
# Should match the number of files in the other machine's `<simulations_dir>/<case_name>/zstash`
$ exit

5. zstash check

On a NERSC machine (Cori):

Code Block
$ cd /global/homes/<first letter>/<username>
$ emacs batch_zstash_check.bash

Paste the following in that file, making relevant edits:

Code Block
#!/bin/bash

# Run on NERSC dtn

# Load environment that includes zstash
<Command to load the E3SM Unified environment>

# List of experiments to archive with zstash
EXPS=(\
<case_name> \
)

# Loop over simulations
for EXP in "${EXPS[@]}"
do
    echo === Checking ${EXP} ===
    cd /global/cfs/cdirs/e3sm/<username>/E3SMv2
    mkdir -p ${EXP}/zstash
    cd ${EXP}
    stamp=`date +%Y%m%d`
    time zstash check -v --hpss=/home/<first letter>/<username>/E3SMv2/${EXP} --workers 2 2>&1 | tee zstash/zstash_check_${stamp}.log
done

...

If you log in on another login node, 
# then you will need to ssh to this one to get back to the screen session.
# Wait for the `hsi` command to finish
# (On Chrysalis, for 165 years of data, this takes ~2 hours)

$ screen -r # Return to screen
# Check output for any errors
$ exit # Terminate screen

$ screen -ls # The screen should no longer be listed
$ hsi
$ ls /home/<first letter>/<username>/E3SMv2/<case_name>
# Should match the number of files in the other machine's `<simulations_dir>/<case_name>/zstash`
$ exit

3. zstash check

On a NERSC machine (Cori):

Code Block
$ cd /global/homes/<first letter>/<username>
$ emacs batch_zstash_check.bash

Paste the following in that file, making relevant edits:

Code Block
#!/bin/bash

# Run on NERSC dtn

# Load environment that includes zstash
<Command to load the E3SM Unified environment>

# List of experiments to archive with zstash
EXPS=(\
<case_name> \
)

# Loop over simulations
for EXP in "${EXPS[@]}"
do
    echo === Checking ${EXP} ===
    cd /global/cfs/cdirs/e3sm/<username>/E3SMv2
    mkdir -p ${EXP}/zstash
    cd ${EXP}
    stamp=`date +%Y%m%d`
    time zstash check -v --hpss=/home/<first letter>/<username>/E3SMv2/${EXP} --workers 2 2>&1 | tee zstash/zstash_check_${stamp}.log
done

Commands to load the E3SM Unified environment for each machine can be found at https://e3sm-project.github.io/zppy/_build/html/main/getting_started.html .

If you’re using E3SM Unified v1.6.0 / zstash v1.2.0 (or greater) and you want to check a long simulation, you can use the --tars option introduced in https://github.com/E3SM-Project/zstash/pull/170 to split the checking into more manageable pieces:

# Starting at 00005a until the end zstash check --tars=00005a- # Starting from the beginning to 00005a (included) zstash check --tars=-00005a # Specific range zstash check --tars=00005a-00005c # Selected tar files zstash check --tars=00003e,00004e,000059 # Mix and match zstash check --tars=000030-00003e,00004e,00005a-

Then, do the following:

Code Block
$ ssh dtn01.nersc.gov

$ screen
$ screen -ls # Output should say "Attached"
$ cd /global/homes/<first letter>/<username>
$ ./batch_zstash_check.bash
# Control A D to exit screen
# DO NOT CONTROL X / CONTROL C (as for emacs). This will terminate the task running in screen!!!

$ screen -ls # Output should say "Detached"
$ hostname
# If you log in on another login node, 
# then you will need to ssh to this one to get back to the screen session.
# Wait for the script to finish
# (On Chrysalis, for 165 years of data, this takes ~5 hours)

$ screen -r # Return to screen
# Check that output ends with `INFO: No failures detected when checking the files.`
# as well as listing real`, `user`, `sys` time information
$ exit # Terminate screen

$ screen -ls # The screen should no longer be listed
$ exit # exit data transfer node
$ cd /global/cfs/cdirs/e3sm/<username>/E3SMv2/<case_name>/zstash
$ tail zstash_check_<stamp>.log
# Output should match the output from the screen (without the time information)
Note

Because of https://github.com/E3SM-Project/zstash/issues/167 , for now it is a good idea to run grep -i Exception zstash_check_<stamp>.log to confirm success.

...

4. Document

On a NERSC machine (Cori):

...

Update simulation Confluence page with path to HPSS information regarding this simulation (For Water Cycle’s v2 work, that’s that page is V2 Simulation Planning ) . SpecifyIn the zstash archive column, specify:

  • /home/<first letter>/<username>/E3SMv2/<case_name>

  • zstash_create_<stamp>.log

  • zstash_check_<stamp>.log

...

5. Delete files

On a NERSC machine (Cori):

...