Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is devoted to instruction in ncremap. It describes steps necessary to create grids, and to regrid datasets between different grids with ncremap. Some of the simpler regridding options supported by ncclimo are also described at Generate, Regrid, and Split Climatologies (climo files) with ncclimo. This page describes those features in more detail, and other, more boutique features often useful for custom regridding solutions.

The Zen of Regridding

Most modern climate/weather-related research requires a regridding step in its workflow. The plethora of geometric and spectral grids on which model and observational data are stored ensures that regridding is usually necessary to scientific insight, especially the focused and variable resolution studies that E3SM models conduct. Why does such a common procedure seem so complex? Because a mind-boggling number of options are required to support advanced regridding features that many users never need. To defer that complexity, this HOWTO begins with solutions to the prototypical regridding problem, without mentioning any other options. It demonstrates how to solve that problem simply, including the minimal software installation required. Once the basic regridding vocabulary has been introduced, we solve the prototype problem when one or more inputs are "missing", or need to be created. The HOWTO ends with descriptions of different regridding modes and workflows that use features customized to particular models, observational datasets, and formats. The overall organization, including TBD sections (suggest others, or vote for prioritizing, below), is:

...

This section describes the recommended procedures to construct and regrid E3SM timeseries data to CMIP6 specifications. Most models provide data to CMIP6 in timeseries format, meaning one variable-per-file with multiple years per file. These timeseries must be regridded to at least one of the CMIP6 standard grids. The E3SM project chose to supply its v1 experiments to CMIP6 archived on rectangular, uniform (i.e., equiangular in latitude and longitude), one-degree (for standard-resolution) and quarter-degree (for high-resolution) grids. Generating these timeseries from experiments as lengthy as 500 model years, formatted to CMIP6 specifications, requires many non-standard options to both ncclimo (to construct the timeseries) and to ncremap (to regrid timeseries), and is a natural capstone exercise in using both together. This section is arranged in reverse order where first we present the final actual commands, followed by the descriptions, meanings, and reasons for particular options.

The recommended procedures for generating EAM /ELM and MPAS-O timeseries of the 500-yr DECK pre-industrial simulations for CMIP6 are:

...

Take a moment to compare the methods for EAM /ELM and for MPAS. They are nearly identical except for the variable names, experiment names and directories, map-files (so far nothing surprising or important) AND the additional MPAS options in ${mpas_opt}. We will discuss those soon. Each command-set begins with setting experiment-dependent I/O directories and a map-files. Other experiments will require changing these to the appropriate I/O directories, yet the map-file remains the same unless the native or destination grid changes. The next three or four lines in each command-set configure the splitter and regridder with options that many ncclimo/ncremap users have never before tried. Finally the list of input files and all the configuration options are sent to ncclimo. The entire procedure for the user boils down to creating then executing this one splitter command for each desired variable.

Regridding is performed only if the splitter (i.e., ncclimo) is invoked with the --map option that supplies a suitable mapfile from the native to the desired destination grid. CMIP6 will only distribute data on 2D structured grids, yet E3SM will itself distribute the timeseries on native (unstructured grids). Hence the commands above construct both the native timeseries (stored in ${drc_out}) and the regridded timeseries for CMIP6 (stored in ${drc_rgr}). Internally, the splitter constructs the native grid timeseries for the same time segement for all requested variables in parallel, waits for completion, and then calls the regridder (ncremap) in parallel with all timeseries for that segment, waits, then continues to the next segment.

...

The RAM overhead of timeseries generation can also be a factor on small nodes. Splitting does most of its work on disk and so requires only as much RAM as required to store a single timestep of a single variable. Regridding is a different kettle of fish, a bird of another feather, and potentially a can of worms. The maximum RAM usage is about three times the uncompressed size of the entire timeseries. For the 500-yr 2D and 25-yr 3D segments considered here, expect peak RAM usage of 20 GB and 64 GB, respectively, for MPAS data. If the regridder exhausts available memory when called with multiple variables, then reduce the parallelization over variables using the --job=${job_nbr} option (not shown). This is unlikely to occur on beefy nodes because job_nbr defaults to 2 (i.e., variables are split and regridded in groups of two). The splitter parallelizes well for typical timeseries of 2D variables, and can be invoked with ${job_nbr} exceeding 100 when no regridding (which consumes much more memory than splitting) is performed.

Now that the content of the rather lengthy CMIP6 splitter/regridder commands has been explained, it is worthwhile describing the method of invocation. The splitter accepts filenames supplied in numerous ways (command-line arguments, pipes to stdin, directory contents, redirection operators) as described above. For large numbers of input files typical of CMIP6 experiments, piping filenames as output by ls from the input file directory into the splitter is preferred for two reasons. First, ls automatically sorts files into alphanumeric order. This is equivalent to timeseries order because of the filename conventions employed by E3SM. Thus ls ensures that timeseries monotonically increase. Moreover, ls understands command-line globbing to simplify culling only required time periods from directories with longer simulations. Second, issuing ls from the input file directory removes the lengthy path component of each filename received by the splitter. For a 500-year pre-industrial DECK simulation, this removes 500*12=6000 copies of the same ~100-character directory path from the provenance metadata maintained in the history attribute of each downstream file.

...