Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Timeseries Reshaping mode, aka Splitting

ncclimo will reshape a  input files that are a series of input files snapshots of all model variables into outputs that are continuous timeseries of each individual variable taken from all input files. Timeseries to be reshaped (split) often come with hard-to-predict names, e.g., because the number of days or months in a file, or timesteps per day or month may all vary. Thus ncclimo in splitter mode requires the user to supply the input filenames. ncclimo will not construct input filenames itself in splitter mode (unlike monthly or annual climo generation mode). ncclimo will, as of version 4.6.4, automatically switch to timeseries reshaping mode if it receives a list of files through a pipe to stdin, or, alternatively, placed as positional arguments (after the last command-line option), or if neither of these is done and no caseid is specified, in which case it assumes all *.nc files in drc_in constitute the input file list. These examples invoke reshaping mode in the four possible ways (choose your poison):

...

-C, --clm_md: Climatology mode. Either mth (default, and for monthly-mean input files), hfc for high-frequency-climate limo (i.e., diurnal cycles from sub-daily ) mean input filesinput files), hfs for high-frequency-splitter (i.e., concatenation of sub-daily. input files), or ann (for annual-mean input files). 

...

If/when MPAS generates the _FillValue attributes itself, this step can and should be skipped. All other ncclimo features like regridding (below) are invoked identically for MPAS as for EAM/ELM users although under-the-hood ncclimo does do some special pre-processing (dimension permutation, metadata annotation) for MPAS. A five-year oEC60to30 MPAS-O climo with regridding to T62 takes < 10 minutes on rhea.

High Frequency climos (produce monthly, seasonal, and annual

...

climatological means of diurnal cycles from diurnally resolved input data)

As of NCO 4.9.4 (September, 2020), ncclimo will produce climatologies that retain the diurnal cycle resolution provided by the input data. These “high frequency climos” are useful for characterizing the diurnal cycle of processes typically retained in EAM/ELM h1-h4 history output, high-frequency observational analyses (e.g., MERRA2, ERA5), and similar data. In all respects except two, high frequency climo features are invoked and controlled by the same options as traditional climo generation from monthly mean input. The most significant difference is that the user must supply the filenames of high-frequency input data via any of the four methods outlined above for splitting. High-frequency climo input dataset names are too complex for ncclimo to automatically generate (as it does for monthly-mean input), so one must supply the names via standard input, positional arguments, or filename globbing, or directory location exactly as for splitter mode described above. The second difference is that the user must supply the --clm_md=hfc option to tell ncclimo to operate in climo-generation rather than splitter mode:

...


The above commands perform a climatology without regridding, then with regridding (all climos stored in ${drc_out}), then with regridding and storing regridded files separately. Paths specified by $drc_in, $drc_out, and $drc_rgr may be relative or absolute. An alternative to regridding during climatology generation is to regrid afterwards with ncremap, which has more special specialized features built-in for regridding. To use ncremap to regrid a climatology in $drc_out and place the results in $drc_rgr, use something like

Code Block
ncremap -I drc_out -m map.nc -O drc_rgr

...

As of 20170526 and version 4.6.7, ncremap supports sub-gridscale (SGS) regridding. Though designed for ALM ELM and CICEMPAS-Seaice, this feature is configurable for other SGS datasets as well. In sub-grid mode, ncremap performs substantial pre- and post-processing so that regridding conserves fields that may represent only a fraction of the entire gridcell. The sub-gridscale fraction represented by each field is contained in a variable (set with the option "--sgs_frc") whose default is "landfrac". SGS mode eases regridding of datasets (e.g., from ALMELM, CLM, and non-MPAS versions of CICECICE, MPAS-Seaice) that output data normalized to a gridcell fraction rather than to its entire extent. SGS mode automatically derives new binary masks ("--sgs_msk", defaults to "landmask") and allows for additional normalization ("--sgs_nrm"). Specific flavors of SGS can be selected (with " -P alm"elm, or clm, cice, or cice mpas-seaice). These ensure regridded datasets recreate the idiosyncratic units (e.g., %, km2) employed by raw ALMELM, CLM, CICE, and CICE MPAS-Seaice model output. 

Code Block
ncremap -P sgs -sm src_grd.nc -i in.nc -d 1x1.nc -o rgrmap_ne30pg2_to_cmip6_180x360_nco.20200901.nc elm_in.nc output.nc
ncremap -P alm -s src_grd.nc -i alm.nc -d 1x1.nc -o rgr-prc_typ=sgs --map=map_ne30pg2_to_cmip6_180x360_nco.20200901.nc elm_in.nc output.nc
ncremap -P ciceelm -s src_grd.nc -id cice1x1.nc -d 1x1elm_data.nc -o rgroutput.nc
ncremap -P sgs --sgs_frc=aice --sgs_msk=tmask --src_nrm=100-prc_typ=mpasseaice --map=map_oEC60to30v3_to_cmip6_180x360_aave.20181001.nc mpasseaice_in.nc output.nc
ncremap -P mpasseaice -s src_grd.nc -id in1x1.nc -d 1x1mpasseaice_data.nc -o rgroutput.nc

Full documentation on SGS mode is here: http://nco.sf.net/nco.html#sgs. Note that ncclimo does not (yet anyway) call ncremap with the SGS option. One gets SGS features only by manually invoking ncremap as above. This is mainly because parallelization with SGS is much more efficient with standalone ncremap than through ncclimo. We solicit your feedback on SGS behavior and future features...

Coupled Runs:

ncclimo works on all ACME E3SM models. It can simultaneously generate climatologies for a coupled run, where climatologies mean both native and regridded monthly, seasonal, and annual averages as per the AG specification. Here are template commands for a recent simulation:

Code Block
caseid=20160121.A_B2000ATMMOD.ne30_oEC.titan.a00
drc_in=/lustre/atlas1/cli112/proj-shared/golaz/ACME_simulations/20160121.A_B2000ATMMOD.ne30_oEC.titan.a00/run
map_atm=${DATA}/maps/map_ne30np4_to_fv129x256_aave.20150901.nc
map_lnd=$map_atm
map_ocn=${DATA}/maps/map_oEC60to30oEC60to30v3_to_cmip6_t62180x360_bilinaave.2016030120181001.nc
map_ice=$map_ocn
Code Block
ncclimo -p mpi -c ${caseid} -m cam        -s 2 -e 5 -i $drc_in -r $map_atm -o ${DATA}/acme/atm
ncclimo        -c ${caseid} -m clm2       -s 2 -e 5 -i $drc_in -r $map_lnd -o ${DATA}/acme/lnd
ncclimo -p mpi              -m mpasompas       -s 2 -e 5 -i $drc_in -r $map_ocn -o ${DATA}/acme/ocn 
ncclimo                     -m mpascicempasseaice -s 2 -e 5 -i $drc_in -r $map_ice -o ${DATA}/acme/ice

...

The basic approach above (running the script from a standard terminal window) works well for small cases can be unpleasantly slow on login nodes of LCFs and for longer or higher resolution (e.g., ne120) climatologies. As a baseline, generating a climatology of 5 years of ne30 (~1x1 degree) EAM output with ncclimo takes 1-2 minutes on rhea (at a time with little contention), and 6-8 minutes on a 2014 MacBook Pro. To make things a bit faster at LCFs, you can ask for your own dedicated node (note this approach doesn't make sense except on supercomputers which have a job-control queue). On rhea do this via:

...

qsub

...

-I

...

-A

...

CLI115

...

-V

...

-l

...

nodes=1

...

-l

...

walltime=00:10:00

...

-N

...

ncclimo

...

#

...

rhea

...

standard

...

node

...

(128

...

GB)

...

qsub

...

-I

...

-A

...

CLI115

...

-V

...

-l

...

nodes=1

...

-l

...

walltime=00:10:00

...

-lpartition=gpu

...

-N

...

ncclimo #

...

rhea

...

bigmem

...

node

...

(1

...

TB)

The equivalents on cooley and cori are:

...

qsub

...

-I

...

-A

...

OceanClimate_

...

4 --nodecount=1

...

--time=00:10:00

...

--jobname=ncclimo

...

#

...

cooley

...

node

...

(384

...

GB)

...

salloc

...

 -A

...

e3sm --nodes=1

...

--partition=debug

...

--time=00:10:00

...

--job-name=ncclimo #

...

cori

...

node

...

(128

...

GB)

Acquiring a dedicated node is useful for any calculation you want to do quickly, not just creating climo files though it does burn through our computing allocation so don't be wasteful. This command returns a prompt once nodes are assigned (the prompt is returned in your home directory so you may then have to cd to the location you meant to run from). At that point you can simply use the 'basic' ncclimo invocation to run your code. It will be faster because you are not sharing the node it's running on with other people. Again, ne30L30 climos only require < 2 minutes, so the 10 minutes requested in the example is excessive and conservative. Tune-it with experience. Here is the meaning of each flag used:

-I (that's a capital i"i”): submit in interactive mode = return a prompt rather than running a program.
--time: how long to keep this dedicated node for. Unless you kill the shell created by the qsub command, the shell will exist for this amount of time, then die suddenly. In the above examples, 3 hrs is requested.
"-l nodes=1" (rhea) or "--nodecount 1" (cooley) or "--nodes=1" (cori/edison): the number of nodes to request. ncclimo will use multiple cores per node.
-V: export existing environmental variables into the new interactive shell. Peter doubts this is actually needed.

-q: the queue name (needed for locations like edison that have multiple queues with no default queue)

-A: the name of the account to charge for time used. This page may be useful for figuring that out if the above defaults don't work: Computational Resources

...

The above commands are meant for Rhea/Titan. The equivalent commands for Cooley /Mira and Cori /Edison are:

Code Block
# Cooley/Mira (Cobalt scheduler):
echo '#!/bin/bash' > ~/ncclimo.cobalt
echo "ncclimo -a scd -d 1 -p mpi -c famipc5_ne120_v0.3_00003 -s 1980 -e 1983 -i /projects/EarthModel/qtang/archive/famipc5_ne120_v0.3_00003/atm/hist -o ${DATA}/ne120/clm" >> ~/ncclimo.cobalt;chmod a+x ~/ncclimo.cobalt
qsub -A HiRes_EarthSys --nodecount=12 --time=00:30:00 --jobname ncclimo --error ~/ncclimo.err --output ~/ncclimo.out --notify zender@uci.edu ~/ncclimo.cobalt

...

Notice that both Cooley/Mira (Cobalt) and Cori/Edison (SLURM) require the introductory shebang-interpreter line (#!/bin/bash) which PBS does not need. Set only the batch queue parameters mentioned above. In MPI-mode, ncclimo determines the appropriate number of tasks-per-node based on the number of nodes available and script internals (like load-balancing for regridding). Hence do not set a tasks-per-node parameter with scheduler configuration parameters as this could cause conflicts.

...

What

...

does

...

ncclimo

...

do?

The basic idea of this script is very simple. For monthly climatologies (e.g. JAN), ncclimo passes the list of all relevant January monthly files to NCO's ncra command, which averages each variable in these monthly files over their time dimension (if it exists) or copies the value from the first month unchanged (if no time axis exists). Seasonal climos are then created by taking the average of the monthly climo files using ncra. In order to account for differing numbers of days per month, the "-w" flag in ncra is used, followed by the number of days in the relevant months. For example, the MAM climo is computed from: "ncra -w 31,30,31 MAR_climo.nc APR_climo.nc MAY_climo.nc MAM_climo.nc" (details about file names and other optimization flags have been stripped here to make the concept easier to follow). The ANN climo is then computed by doing a weighted average of the seasonal climos.

...