Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

This page is devoted to instruction in ncremap. It describes steps necessary to create grids, and to regrid datasets between different grids with ncremap. Some of the simpler regridding options supported by ncclimo are also described at Generate, Regrid, and Split Climatologies (climo files) with ncclimo. This page describes those features in more detail, and other, more boutique features often useful for custom regridding solutions.

The Zen of Regridding

Most modern climate/weather-related research requires a regridding step in its workflow. The plethora of geometric and spectral grids on which model and observational data are stored ensures that regridding is usually necessary to scientific insight, especially the focused and variable resolution studies that E3SM models conduct. Why does such a common procedure seem so complex? Because a mind-boggling number of options are required to support advanced regridding features that many users never need. To defer that complexity, this HOWTO begins with solutions to the prototypical regridding problem, without mentioning any other options. It demonstrates how to solve that problem simply, including the minimal software installation required. Once the basic regridding vocabulary has been introduced, we solve the prototype problem when one or more inputs are "missing", or need to be created. The HOWTO ends with descriptions of different regridding modes and workflows that use features customized to particular models, observational datasets, and formats. The overall organization, including TBD sections (suggest others, or vote for prioritizing, below), is:

Software Requirements
Prototypical Regridding I: Use Existing Map-file
Prototypical Regridding II: Create Map-file from Known Grid-files
Prototypical Regridding III: Infer Grid-file from Data-file
Prototypical Regridding IV: Manual Grid-file Generation
Intermediate Regridding I: MPAS-mode (TBD)
Intermediate Regridding II: Renormalization (TBD)
Intermediate Regridding III: TempestRemap (TBD)
Intermediate Regridding IV: Parallelism (TBD)
Advanced Regridding I: Regional SE Output (RRG-mode) (Done!)
Advanced Regridding II: Sub-Gridscale Regridding (SGS-mode) (TBD)
Advanced Regridding III: Make All Weight Files (MWF-mode) (TBD)

Software Requirements:

At a minimum, install a recent version NCO on your executable $PATH with the corresponding library on your $LD_LIBRARY_PATH. NCO installation instructions are here (fxm: link). We highly recommend installing NCO through the conda package. That will automatically install another important piece of the regridding toolchain, the ESMF_RegridWeightGen (aka ERWG) executable. Execute 'ncremap --config' to verify you have a working installation:

zender@aerosol:~$ ncremap --config
ncremap, the NCO regridder and map- and grid-generator, version 4.7.6-alpha03
...[Legal Stuff]...
Config: ncremap running from directory /Users/zender/bin
Config: Calling NCO binaries in directory /Users/zender/bin
Config: Binaries linked to netCDF library version 4.4.1.1
Config: No hardcoded path/module overrides
Config: ESMF weight-generation command ESMF_RegridWeightGen found as /opt/local/bin/ESMF_RegridWeightGen
Config: Tempest weight-generation command GenerateOfflineMap found as /usr/local/bin/GenerateOfflineMap

Only NCO is required for basic ncremap operation. To create new mapfile you will want ERWG. TempestRemap is not yet available in a pre-packaged format and must be built from scratch. It is only required for power-users. Make sure ncremap reports a sufficiently working status as above before proceeding further.

Prototypical Regridding I: Use Existing Map-file

The regridding problem most commonly faced is converting output from a standard resolution model simulation to equivalent data on a different grid for visualization or intercomparison with other data. The EAM v1 model low-resolution simulations are performed and output on the ne30np4 SE (spectral element) grid, aka the "source grid". Data on this source grid have only one horizontal dimension (i.e., 1D) which makes them difficult to visualize. The recommended (fxm: link) 2D grid for analysis of these simulations is the 129x256 FV (finite-volume) grid, aka the "destination grid". The single most important capability of a regridder is the intelligent application of weights that transform data on the input grid to the desired output grid. These weights are stored in a "map-file", a single file that contains all the necessary weights and grid information necessary. Most regridding problems revolve around creating the appropriate map-file. This prototype problem is well-trod ground, so the appropriate map-file (map.nc) already exists (fxm: link) and ncremap can immediately transform the input dataset (dat_src.nc) to the output (regridded) dataset (dat_rgr.nc):

ncremap -m map.nc dat_src.nc dat_rgr.nc

This solution is deceptively simple because it conceals the choices, paths, and options required to create the appropriate map.nc for all situations. We will discuss creating map.nc later after showing more powerful and parallel ways to solve the prototype problem. For now, note that the solution above only works for users savvy enough to know how to find appropriate pre-built map-files. E3SM map-files are available here (fxm: link). At most DOE High Performance Computing (HPC) centers these can also be found in my (@czender's) directory as ~zender/data/maps. Take a minute now to look there.

Pre-built map-files use the (nearly) standardized naming convention map_srcgrd_to_dstgrd_algtyp.YYYYMMDD.nc, where srcgrd and dstgrd are the source and destination grid names, algtyp is a shorthand for the numerical regridding algorithm, and YYYYMMDD is the date the map was created. The source grid in the example above is called ne30np4, the destination is called fv129x256. The only pre-built map for this combination is map.nc = map_ne30np4_to_fv129x256_aave.20150901.nc. What is "aave"? Weight generators can use about a dozen interpolation algorithms for regridding, and each has a shorthand name. For now, it is enough to know that the two most common algorithms are (first-order) conservative area-average regridding ("aave") and bilinear interpolation ("bilin" or "blin"). Hence this conservatively regrids dat_src.nc to dat_rgr.nc with first order accuracy:

ncremap -m map_ne30np4_to_fv129x256_aave.20150901.nc dat_src.nc dat_rgr.nc

Before looking into map-file generation in the next section, try a few ncremap features. For speed's sake, regrid only selected variables:

ncremap -v FSNT,AODVIS -m map.nc dat_src.nc dat_rgr.nc

To regrid multiple files with a single command, supply ncremap with the source and regridded directory names (drc_src, drc_rgr). It regrids every file in drc_src and places the output in drc_rgr:

ncremap -m map.nc -I drc_src -O drc_rgr

Or supply specific input filenames on the command-line, piped through standard input, or redirected from a file:

ncremap -m map.nc -O drc_rgr mdl*2005*nc
ls mdl*2005*nc | ncremap -m map.nc -O drc_rgr
ncremap -m map.nc -O drc_rgr < file_list.txt

When an output directory is not specified, ncremap writes to the current working directory. When the output and input directories are the same, ncremap appends a string (based on the destination grid resolution) to each input filename to avoid name collisions. Finally, be aware that multiple-file invocations of ncremap execute serially by default. Power users will want to parallelize this as described in the section on "Intermediate Regridding".

Prototypical Regridding II: Create Map-file from Known Grid-files

The simplest regridding procedure applies an existing map-file to your data, as in the above example. If the desired map-file cannot be found, then you must create it. Creating a map-file requires a complete specification of both source and destination grids. The files that contain these grid specifications are called "grid-files". E3SM grid-files are available here (fxm: link). At most DOE High Performance Computing (HPC) centers these can also be found in my (@czender's) directory as ~zender/data/grids. Take a minute now to look there for the prototype problem grid-files, i.e., for FV 129x256 and ne30np4 grid-files.

You might find multiple grid-files that contain the string "129x256". Grid-file names are often ambiguous. The grid-file global metadata (ncks -M grid.nc) often displays the source of the grid. These metadata, and sometimes the actual data (fxm: link), are usually more complete and/or accurate in files with a YYYYMMDD-format date-stamp. For example, the metadata in file 129x256_SCRIP.20150901.nc clearly state it is an FV grid and not some other type of grid with 129x256 resolution. The metadata in 129x256_SCRIP.130510 tell the user nothing about the grid boundaries, and some of the data are flawed. When grids seem identical except for their date-stamp, use the grid with the later date-stamp. The curious can examine a grid-file (ncks -M -m grid.nc) and easily see it looks completely different from a typical model or observational data file. Grid-files and data-files are not interchangeable.

Multiple grid-files also contain the string "ne30". These are either slightly different grids, or the same grids store in different formats meant for different post-processing tools. The different SE (and FV) grid types are described with figures and described here (https://acme-climate.atlassian.net/wiki/spaces/Docs/pages/34113147/Atmosphere+Grids). As explained there, most people will want the "dual-grid" with pentagons. The correct grid-file for this is ne30np4_pentagons.091226.nc. Do not be tempted by SE grid-files named with "latlon".

All grid-files discussed so far are in SCRIP-format, named for the Spherical Coordinate Remapping and Interpolation Package (authored by @pjones). Other formats exist and are increasingly important, especially for SE grids. For now just know that these other formats are also usually stored as netCDF, and that some tools allow non-SCRIP formats to be used interchangeably with SCRIP.

Once armed with source and destination grid-files, one can generate their map-file with

ncremap -s grd_src.nc -g grd_dst.nc -m map.nc

Regrid a datafile at the same time as generating the map-file for archival:

ncremap -s grd_src.nc -g grd_dst.nc -m map.nc dat_src.nc dat_rgr.nc

Regrid a datafile without archiving the (internally generated) map-file:

ncremap -s grd_src.nc -g grd_dst.nc dat_src.nc dat_rgr.nc

For the prototype problem, the map-file generation procedure becomes

ncremap -s ne30np4_pentagons.091226.nc -g 129x256_SCRIP.20150901.nc -m map_ne30np4_to_fv129x256_bilin.YYYYMMDD.nc

The map-file above is named with algtyp="bilin" because the ncremap default interpolation method is bilinear. To re-create the "aave" map in the first example, invoke ncremap with the "-a conserve":

ncremap -a conserve -s ne30np4_pentagons.091226.nc -g 129x256_SCRIP.20150901.nc -m map_ne30np4_to_fv129x256_aave.YYYYMMDD.nc

This takes a few minutes, so save custom-generated map-files for future use. Computing weights to generate map-files is much more computationally expensive and time-consuming than regridding, i.e., than applying the weights in the map-file to the data. We will gloss over most options that weight-generators can take into consideration, because their default values often work well. One option worth knowing now is "-a". The invocation synonyms for "-a" are "--alg_typ", "--algorithm", and "--regrid_algorithm". These are listed in the square brackets in the self-help message that ncremap prints when it is invoked without argument, or with "--help":

zender@aerosol:~$ ncremap
...
-a alg_typ Algorithm for weight generation (default bilinear) [alg_typ, algorithm, regrid_algorithm]
ESMF algorithms: bilinear|conserve|conserve2nd|nearestdtos|neareststod|patch
Tempest algorithms: tempest|se2fv_flx|se2fv_stt|se2fv_alt|fv2se_flx|fv2se_stt|fv2se_alt
...

One valid option argument for each supported interpolation type is shown separated by vertical bars. The arguments shown have multiple synonyms that are equivalent. For example, "-a conserve" is equivalent to "-a aave" and to "--alg_typ=conservative". Use the longer option form for clarity and precision, and the shorter form for conciseness. The full list of synonyms, and the complete documentation, is at http://nco.sf.net/nco.html#alg_typ. "bilinear" and "conservative" are the most-used algorithms. Peruse the list of options now, though defer a thorough investigation until you reach the "Intermediate Regridding" section.

Prototypical Regridding III: Infer Grid-file from Data-file

Thus far we have explained how to apply a map-file to data, and how, if necessary, to generate a map-file from known grids. What if there is no map-file and the source or the destination grid-files (or both) are unavailable? Often, one knows the basic information about a grid (e.g., resolution) but lacks the grid-file that contains the complete information for that grid geometry. In such cases, one must create the grid-file via one of two methods. First, one can let ncremap attempt to infer the grid-file from a data file known to be on the desired grid. This procedure is called "inferral" and is fairly painless. Second, one can feed NCO all the required parameters and it will generate a grid-file. This requires a precise specification of the grid geometry, and will be covered the sub-section on "Manual Grid-file Generation".

Before we describe what the inferral procedure does, here is an example that demonstrates how easy it is. You can regrid an SE dataset from our prototype example to the same grid as an evaluation dataset. Pick any 2D (i.e., latxlon) dataset to compare the SE data to. Inferral uses the grid information in the evaluation dataset, which is already on the desired destination grid, to create the (internally generated) destination grid-file. Supply ncremap with any dataset on the desired destination grid (dat_dst.nc) with "-d" (for "dataset") instead of "-g" (for "grid"):

ncremap -s ne30np4_pentagons.091226.nc -d dat_dst.nc -m map_ne30np4_to_1x1_bilin.YYYYMMDD.nc

This tells ncremap to infer the destination grid-file and to use it to generate the desired map-file, named with the supposed destination resolution (here, 1x1 degree). To archive the inferred destination grid for later use, supply ncremap with a name for it:

ncremap -s ne30np4_pentagons.091226.nc -d dat_dst.nc -g grd_dst.nc -m map_ne30np4_to_1x1_bilin.YYYYMMDD.nc # Requires NCO version >= 4.7.6

Of course one can infer a grid without having to regrid anything. Supply ncremap with a data-template file (dst_dst.nc) and a grid-file name (grd.nc). Since there are no input files to regrid, ncremap exits after inferring the grid-file:

ncremap -d dat.nc -g grd.nc

Grid-inferral is easier to try than manual grid-generation, and will work if the data file contains the necessary information. The only data needed to construct a SCRIP grid-file are the vertices of each gridcell. The gridcell vertices define the gridcell edges and these in turn define the gridcell area which is equivalent to the all-important weight parameter necessary to regrid data. Of course the gridcell vertices must be stored with recognizable names and/or metadata indicators. The Climate & Forecast (CF) metadata convention calls the gridcell vertices the "cell-bounds". Coordinates (like latitude and longitude) usually store cell-center values, and should, according to CF, have "bounds" attributes whose values point to variables (e.g., "lat_bounds" or "lon_vertices") that contain the actual vertices. Relatively few datasets "in the wild" contain gridcell vertices, though the practice is, happily, on the rise. Formally SE models have nodal points with weights without any fixed perimeter or vertices assigned to the nodes, so the lack of vertices in the model output is a feature, not an oversight. The dual-grid (referenced above) addresses this by defining "pretend" gridcell vertices for each nodal point so that an SE dataset can be treated like an FV dataset.

Inferral works well on important categories of grids for which ncremap can guess the missing grid information. In the absence of gridcell vertice information, ncremap examines the location of and spacing between gridcell centers and can often determine from these what type of grid a data-file (not a grid-file!) is stored on. A data-file simply means the filetype preferred for creation/distribution of modeled/observed data. Hence ncremap has the (original and unique, so far as we know) ability to infer all useful rectangular grid-types from data-files that employ the grid. The key constraint here is "rectangular", meaning the grid must be orthogonal (though not necessarily regularly spaced) in latitude and longitude. This includes all uniform angle grids, FV grids, and Gaussian grids. For curvilinear grids (including most swath data), ncremap infers the underlying grid to be the set of curves that bisect the curves created by joining the gridcell centers. This often works well for curvilinear grids that do not cross a pole. Inferral works for unstructured (i.e., 1D) grids only when the cell-bounds are stored in the datafile as described above. Hence inferral will not work on raw output from SE models.

A few more examples will flesh-out how inferral can be used. First, ncremap can infer both source and destination grids in one command:

ncremap -d dat_dst.nc dat_src.nc dat_rgr.nc

Here the user provides only data files (no grid- or map-files) yet still obtains regridded output! The first positional (i.e., not immediately preceded by an option indicator) argument (dat_src.nc) is interpreted as the data to regrid, and the second positional argument (dat_rgr.nc) is the output name. The -d argument is the name of any dataset (dat_dst.nc) on the desired destination grid. ncremap infers the source grid from dat_src.nc, then infers the destination grid from dat_dst.nc, then creates a map-file and uses it to regrid dat_src.nc to dat_rgr.nc. No grid-file or map-file names were specified (with -g or -m) so both grid-files and the map-file are generated internally in temporary locations and erased after use.

Second, this procedure, like most ncremap features, works for multiple input files:

ncremap -d dat_dst.nc -I drc_in -O drc_rgr
ncremap -d dat_dst.nc -I drc_in

Unless a map-file or source grid-file is explicitly provided (with -m or -s, respectively), ncremap infers a separate source grid-file (and computes a corresponding map-file) for each input file. This so it can regrid lists of uniquely gridded data (e.g., satellite swaths each on its own grid) to a common destination grid. When all source files are on the same grid (as is typical with models), then turn-off the expensive multiple inferral and map-generation procedures with the -M switch to save time:

ncremap -M -d dat_dst.nc -I drc_in -O drc_rgr

Prototypical Regridding IV: Manual Grid-file Generation

If a desired grid-file is unavailable, and no dataset on that grid is available (so inferral cannot be used), then one must manually create a new grid. Users create new grids for many reasons including dataset intercomparisons, regional studies, and fine-tuned graphics. NCO and ncremap support manual generation of the most common rectangular grids as SCRIP-format grid-files. Create a grid by supplying ncremap with a grid-file name and "grid-formula" (grd_sng) that consists, at a minimum, the grid-resolution. The grid-formula is a hash-separated string of name-value pairs each representing a grid parameter. All parameters except grid resolution have reasonable defaults, so a grid-formula can be as simple as "latlon=180,360":

ncremap -g grd.nc -G latlon=180,360

Once created, the grid-file grd.nc is a valid source or destination grid for ncremap commands.

Grid-file generation documentation in the NCO Users Guide at http://nco.sf.net/nco.html#grid describes all the grid parameters and contains many examples. Note that the examples in this section use grid generation API for ncremap version 4.7.6 (August, 2018) and later. Earlier versions can use the ncks API explained in the Users Guide.

The most useful grid parameters (besides resolution) are latitude type (lat_typ), longitude type (lon_typ), title (ttl), and, for regional grids, the SNWE bounding box (snwe). The three supported varieties of global rectangular grids are Uniform/equiangular (lat_typ=uni), Cap/FV (lat_typ=cap), and Gaussian (lat_typ=gss). The four supported varieties of longitude types are the first (westernmost) gridcell centered at Greenwich (lon_typ=grn_ctr), western edge at Greenwish (grn_wst), or at the Dateline (lon_typ=180_ctr and lon_typ=180_wst, respectively). Grids are global, uniform, and have their first longitude centered at Greenwich by default. The grid-formula for this is 'lat_typ=uni#lon_typ=grn_ctr'. Some examples (remember, this API requires NCO 4.7.6+):

ncremap -g grd.nc -G latlon=180,360 # 1x1 Uniform grid
ncremap -g grd.nc -G latlon=180,360#lon_typ=grn_wst # 1x1 Uniform grid, Greenwich-west edge
ncremap -g grd.nc -G latlon=129,256#lat_typ=cap # 1.4x1.4 FV grid
ncremap -g grd.nc -G latlon=94,192#lat_typ=gss # T62 Gaussian grid

Regional grids are a powerful tool in regional process analyses, and can be much smaller in size than global datasets. Regional grids are always uniform. Specify the rectangular bounding box, i.e., the outside edges of the region, in SNWE order:

ncremap -g grd.nc -G ttl="'Equi-Angular 1x1 Greenland grid'"#latlon=30,90#snwe=55.0,85.0,-90.0,0.0

The "extra" quotation marks protect the spaces in the title string from being interpreted as option separators.

Intermediate Regridding:

The sections on Prototypical Regridding were intended to be read sequentially and introduced the most frequently required ncremap features. The Intermediate and Advanced regridding sections are an a la carte description of features most useful to particular component models, workflows, and data formats. Peruse these sections in any order.

Intermediate Regridding I: MPAS-mode

Intermediate Regridding II: Renormalization

Intermediate Regridding III: TempestRemap

Intermediate Regridding IV: Parallelism

Advanced Regridding I: Regional SE Output (RRG-mode)

EAM and CAM-SE will produce regional output if requested to with the finclNlonlat namelist parameter. Output for a single region can be higher temporal resolution than the host global simulation. This facilitates detailed yet economical regional process studies. Regional output files are in a special format that we call RRG (for "regional regridding"). An RRG file may contain any number of rectangular regions. The coordinates and variables for one region do not interfere with other (possibly overlapping) regions because all variables and dimensions are named with a per-region suffix string, e.g., lat_128e_to_134e_9s_to_16s. ncremap can easily regrid RRG output from an FV-dycore because ncremap can infer (as discussed above) the regional grid from any FV data file. Regridding regional SE data, however, is more complex because SE gridcells are essentially weights without vertices (as and SE weight-generators are not yet flexible enough to generate the regional weights. To summarize, regridding RRG data leads to three SE-specific difficulties (#1-3 below) and two difficulties (#4-5) shared with FV RRG files:

1. RRG files contain only regional gridcell center locations, not weights
2. Global SE grids have well-defined weights not vertices for each gridpoint
3. Grid generation software (ESMF and TempestRemap) only create global not regional SE grid files
4. Non-standard variable names and dimension names
5. Regional files can contain multiple regions

ncremap's RRG mode resolves these issues to allow trouble-free regridding of SE RRG files. The user must provide two additional input arguments, '--dat_glb=dat_glb' and '--grd_glb=grd_glb' to point to a global SE dataset and grid, respectively, of the same resolution as the model that generated the RRG datasets. Hence a typical RRG regridding invocation is:

ncremap --dat_glb=dat_glb.nc --grd_glb=grd_glb.nc -g grd_rgn.nc dat_rgn.nc dat_rgr.nc

Here grd_rgn is a regional destination grid-file, dat_rgn is the RRG file to regrid, and dat_rgr is the regridded output. Typically grd_rgn is a uniform rectangular grid covering the same region as the RRG file. Generate this as described in the last example in the section above on "Manual Grid-file Generation". grd_glb is the standard dual-grid grid-file for the SE resolution of the simulation, e.g., ne30np4_pentagons.091226.nc. ncremap regrids the global data file dat_glb to the global dual-grid in order to produce a intermediate global file annotated with gridcell vertices. Then it hyperslabs the lat/lon coordinates (and vertices) from the regional domain to use with regridding the RRG file. A grd_glb file with only one 2D field suffices (and is fastest) for producing the information needed by the RRG procedure. One can prepare an optimal dat_glb file by subsetting any 2D variable (e.g., ncks -v FSNT in.nc dat_glb.nc) from a full global SE output dataset.

ncremap RRG mode supports two additional options to override parameters set internally. First, the per-region suffix string may be set with '--rnm_sng=rnm_sng'. RRG mode will, by default, regrid the first region it finds in an RRG file. Explicitly set the desired region with rnm_sng for files with multiple regions, e.g., "--rnm_sng= ". Second, the bounding-box of the region may be explicitly set with '--bb_wesn=lon_wst,lon_est,lat_sth,lat_nrt'. The normal parsing of the bounding-box string from the suffix string may fail in (as yet undiscovered) corner cases, and the "--bb_wesn" option provides a workaround. The bounding-box string must include the entire RRG region, specified in WESN order. The two override options may be used independently or together, as in:

ncremap --rnm_sng='_128e_to_134e_9s_to_16s' --bb_wesn='128.0,134.0,-16.0,-9.0' --dat_glb=dat_glb.nc --grd_glb=grd_glb.nc -g grd_rgn.nc dat_rgn.nc dat_rgr.nc

RRG-mode supports most normal ncremap options, including input and output methods and regridding algorithms.

Advanced Regridding II: Sub-Gridscale Regridding (SGS-mode)

Advanced Regridding III: Make All Weight Files (MWF-mode)

  • No labels