Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ncclimo can re-use previous work and produce extended (i.e., longer duration) climatologies by combining two previously computed climatologies (this is called the binary method) or by computing a new climatology from raw monthly model output and then combining that together with a previously computed climatology (this called the incremental method). Producing an extended climatology by the incremental method requires , at minimum, specifying (with -S and -s, respectively) the start year years of the previously computed and current climo and (with -e) the end year of the current climo. Producing an extended climatology by the binary method requires , at minimum, specifying both the start years (with -S and -s) and end years (with -E and -e) of both pre-computed climatologies. The presence of the -E option signifies to ncclimo to employ the binary (not incremental) method.

Following are two examples of computing extended climatologies using the binary method (where i.e., both input climatologies are already computed using the normal methods above). In the first example, note that the input directory is the same for both pre-computed climatologies, and is the same as the output directory for the extended climatology. This is perfectly legal and keeps the number of required options to a a minimum. It amounts to storing all native grid climatologies in the same directory:

...

If both input climatologies are in the same directory in which the output (extended) climatology is to be stored, then the number of required options is minimal:

caseid=somelongname
drc_in=/scratch1/scratchdirs/zender/acme/${caseid}/atm
ncclimo -c ${caseid} -m cam -S 210 -E 520 -s 621 -e 1550 -i ${DATA}/acme/atm -o ${DATA}/acme/atm
ncclimo        -c ${caseid} -m clm2 -S 2 -E 5 -s 6 -e 15 -i ${DATA}/acme/lnd -o ${DATA}/acme/lnd
ncclimo -p mpi -c hist      -m ocn  -S 2 -E 5 -s 6 -e 15 -i ${DATA}/acme/ocn -o ${DATA}/acme/ocn 
ncclimo        -c hist      -m ice  -S 2 -E 5 -s 6 -e 15 -i ${DATA}/acme/ice -o ${DATA}/acme/icedrc_in}

When no output directory is specified, ncclimo internal logic automatically places the extended climo in the input climo directory. Files are not overwritten because the extended climos have different names than the input climos. The next example utilizes the directory structure and options that Chris Golaz adopted for coupled ACME simulations. The extra options (compared to the idealized example above) supply important information. The input climos were generated in seasonally discontiguous December (sdd) mode so the extended climatology must also be generated with the '-a sdd' option (or else ncclimo will not find the pre-computed input files). The input directory for the first pre-computed input climatology is specified with -x. The second pre-computed input climatology is specified with the usual -i option. A new output directory for the extended climos is specified with -X.

caseid=20161117.beta0.A_WCYCL1850S.ne30_oEC_ICG.edison
drc_ntv=/scratch2/scratchdirs/golaz/ACME_simulations/20161117.beta0.A_WCYCL1850S.ne30_oEC_ICG.edison/pp/clim # Native
drc_rgr=/scratch2/scratchdirs/golaz/ACME_simulations/20161117.beta0.A_WCYCL1850S.ne30_oEC_ICG.edison/pp/clim_rgr # Regridded
ncclimo -a sdd -c ${caseid} -m cam -S 41 -E 50 -x ${drc_ntv}/0041-0050 -s 51 -e 60 -i ${drc_ntv}/0051-0060 -X ${drc_ntv}/0041-0060
ncclimo -a sdd -c ${caseid} -m cam -S 41 -E 50 -x ${drc_rgr}/0041-0050 -s 51 -e 60 -i ${drc_rgr}/0051-0060 -X ${drc_rgr}/0041-0060

The extended native and regridded climatologies are produced with virtually the same command (only the input and output directories differ). No mapping file or regridding option is necessary to produce an extended climatology from two input regridded climatologies. ncclimo need not know or care whether the input climos are native-grid or are already regridded. So long as the regridded climatologies are already available, it make sense to use them rather than to perform a second regridding. While ncclimo can generate and regrid an extended climatology from native-grid inputs in one command, doing so involves more command-line options and it is generally simpler to follow the above procedure. Ask me if you would like help customizing ncclimo for other such workflows Producing extended climatologies via the binary method consumes much less memory than producing normal or incremental climatologies. The binary method simply computes weighted averages of each input variable. Hence the maximum RAM required is approximately only three times the size of the largest input variable. This is trivial compared to the total input file size, hence the extended climos may be computed with background parallelism, which is the default in ncclimo. Hence the '-p mpi' option is never necessary for producing extended climos using the binary method. As you might imagine, the combination of low memory overhead and re-use of previously regridded climos means that producing extended regridded climos via the binary method is extremely fast compared to computing normal climos. Binary climos (regridded or not) require only about 1 minute on Edison.

Memory Considerations:

It is important to employ the optimal ncclimo  parallelization strategy for your computer hardware resources. Select from the three available choices with the '-p par_typ' switch. The options are serial mode ('-p nil' or '-p serial'), background mode parallelism ('-p bck'), and MPI parallelism ('-p mpi'). The default is background mode parallelism, which is appropriate for lower resolution (e.g., ne30L30) simulations on most nodes at high-performance computer centers. Use (or at least start with) serial mode on personal laptops/workstations. Serial mode requires twelve times less RAM than the parallel modes, and is much less likely to deadlock or cause OOM (out-of-memory) conditions on your personal computer. If the available RAM (+swap) is < 12*4*sizeof(monthly input file), then try serial mode first (12 is the optimal number of parallel processes for monthly climos, the computational overhead is a factor of four). CAM-SE ne30L30 output is about ~1 GB per month so each month requires about 4 GB of RAM. CAM-SE ne30L72 output (with LINOZ) is about ~10 GB/month so each month requires ~40 GB RAM. CAM-SE ne120 output is about ~12 GB/month so each month requires ~48 GB RAM. The computer does not actually use all this memory at one time, and many kernels compress RAM usage to below what top reports, so the actual physical usage is hard to pin-down, but may be a factor of 2.5-3.0 (rather than a factor of four) times the size of the input file. For instance, my 16 GB MacBookPro will successfully run an ne30L30 climatology (that requests 48 GB RAM) in background mode, but the laptop will be slow and unresponsive for other uses until it finishes (in 6-8 minutes) the climos. Experiment a bit and choose the parallelization option that works best for you. 

...