This page meant to be a “cheat sheet” for basic NERSC commands meant to be reference during tutorial. I’m making notes here now, but will try to simplify. Let me know what else could be included.
To login to perlmutter: `ssh -l $USER saul-p1.nersc.gov` (multiple login nodes, all can submit jobs to CPU/GPU compute nodes)
For the E3SM tutorial, all users associated will be added to a temporary project named ntrain6
(not username, this is your unix group and where compute hours charged, etc). To check if you are in ntrain6
, try the command groups
.
To ask for a compute node: salloc -A ntrain6 --reservation=xx -C cpu -N 1 -c 32 -t 30:00 -q shared
Note you don’t need --qos=interactive
for reservations, but would want to use that when not using reservation.
Notes about reservations (nodes set aside for given time for use by one or several users):
To use it, need these flags to sbatch: --reservation e3sm_dryrun -A ntrain6
where the -A
is indeed changing the account to be used – it charges against ntrain6
instead.
Note that the reservation name will be different for each new resv. For example, tomorrow, it will be called e3sm_testrun
, and on the days of tutorial, i t will be named something else – probably a diff name each day. And the account to be used will always be ntrain6
for us.
After a job is submitted, slurm allows a user to change a few things (before job starts to run!). Examples:
scontrol update qos=debug jobid=xx -- move job xx to the debug qos scontrol update qos=debug timelimit=30 jobid=xx -- move job xx to the debug qos and change walltime to 30 min scontrol update qos=regular -- move job xx to regular qos scontrol update reservation=e3sm_dryrun account=ntrain6 jobid=xx -- move job xx to reservation noted scontrol update reservation=e3sm_dryrun account=ntrain6 timelimit=60 jobid=xx -- move job xx to reservation noted and set walltime to 90 min
To see list of your jobs: squeue -u $USER
Unrelated to reservations, here is possible solution to the issue of some users not having write access to e3sm space:
/global/cfs/cdirs/ntrain6/www
just tell them the contents will be removed after the training/global/cfs/cdirs/e3sm/www/Tutorials/2024/users
– Wuyin notes may also work as he has changed permissions?