Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The map-checker performs numerous checks and reports numerous statistics, probably more than you care about. Be assured that each piece of provided information has in the past proved useful to developers of weight-generation and regridding algorithms. Most of the time, users can learn whether the examined map is of sufficient quality for their purposes by examing only a few of these statistics. Before defining these primary statistics, it is helpful to understand the meaning of the weight-array S (stored in a map-file as the variable S), and the terminology of rows and columns.

A remapping (aka regridding) transforms a field on an input grid to an an output grid while conserving to the extent possible or desired the local and global properties of the field. The map S is a matrix of M rows and N columns of weights, where M is the number of gridcells (or degrees of freedom, DOFs) in the destination grid, and N is the number of gridcells (or DOFs) in the source grid. An individual weight S(m,n) represents the fractional contribution to destination gridcell m by source gridcell n.

By convention the weights are normalized to sum to unity in each row (destination gridcell) that completely overlaps that the input grid. Thus the weights in a single row are all equivalent to the fractional destination areas that the same destination gridcell (we will drop the DOF terminology hereafter for conciseness) receives from each source gridcell. Regardless of the values of the individual weights, it is intuitive that their row-sum should never exceed unity because that would be physically equivalent to an output gridcell receiving more than its own area from the source grid. Map-files typically store these row-sum statistics for each destination gridcell in the frac_b variable described further below.

Likewise the weights in a single column represent the fractional destination areas that a single source gridcell contributes to every output gridcell. Each output gridcell in a column may have a different area so column-sums need not, and in general do not, sum to unity. However, a source gridcell ought to contribute to the destination grid a total area equal to its own area. Thus a constraint on column-sums is that their weights, themselves weighted by the destination gridcell area corresponding to each row, should sum exactly to the source gridcell area. In other words, the destination-area-weighted column-sum divided by the source gridcell area would be unity (in a perfect first order map) for every source gridcell that completely overlaps valid destination gridcells. Map-files typically store these area-weighted-column-sum-ratio statistics for each gridcell in the frac_a variable described further below.

Storing the entire weight-matrix S is unnecessary because only a relative handful of gridcells in the source grid contribute to a given destination gridcell, and visa versa. Instead, map-files store only the non-zero S(m,n), and encodes encode them as a sparse-matrix. Storing S as a sparse matrix rather than a full matrix reduces overall storage sizes by a factor on the order of the ratio of the product of the grid sizes to their sum, or about 10,000 for grids with horizontal resolution near one degree, and more for finer resolutions. The sparse-matrix representation is a one-dimensional array of weights S, together with two ancillary arrays named , row and column, that contain the one-dimensional row and column indices, respectively, corresponding to the destination and source gridcells of the associated weight. By convention, map-files store the row and column indices using the 1-based convention in common use in the 1990s when regridding software was all written in Fortran. The map-checker prints cell locations with 1-based indices as well:

...

Here the map-file weights span twenty-five orders of magnitude, which . This may seem large though in practice is typical for high-resolution intersection grids. The Fortran-convention index of each weight extreme is followed by its geographic latitude and longitude. Reporting the locations of extrema, and of gridcells whose metrics miss their target values by more than a specificied tolerance, are prime map-checker features.

As mentioned above, the two statistics most telling about map quality are the weighted column-sums named frac_a and the row-sums named frac_b. The short-hand names for what these metrics quantify are Conservation and Consistency, respectively. Conservation means the total fraction of an input gridcell that contributes to the output grid. For global input and output grids that completely tile the sphere, the entirety of each input gridcell should contribute (i.e., map to) the output grid. The same concept that applies locally to conservation of a gridcell value applies globally to the overall conservation of an input field. Thus a perfectly conservative mapping between global grids that tile the sphere would have frac_a = 1.0 for every input gridcell, and for the mean of all input gridcells.

The map-checker computes Conservation (frac_a) from the stored variables S, row, column, area_a, and area_b in the map-file, and then compares those values to the frac_a values (if any) on-disk, and warns of any disagreements. By definition, conservation is perfect to first order if the sum of the destination-gridcell-area-weighted weights (which is an area) equals the source gridcell area, and so their ratio (frac_a) is unity. Computing the area-weighted-column-sum-ratios and comparing those frac_a to the stored frac_a catches any discrepancies. The analysis sounds an alarm when discrepancies exceed a tolerance (currently 5.0e-16). More importantly, the map-checker reports the summary statistics of the computed frac_a metrics and their imputed errors, including the grid mean, minimum, maximum, mean-absolute bias, root-mean-square bias, and standard deviation.

% ncks --chk_map map_ne30np4_to_cmip6_180x360_nco.20190601.nc
...
Conservation metrics (column-sums of area_b-weighted weights normalized by area_a) and errors---
Perfect metrics for global Grid B are avg = min = max = 1.0, mbs = rms = sdn = 0.0:
frac_a avg: 1.0000000000000000 = 1.0-0.0e+00 // Mean
frac_a min: 0.9999999999991109 = 1.0-8.9e-13 // Minimum in grid A cell [45328,+77.3747,+225]
frac_a max: 1.0000000000002398 = 1.0+2.4e-13 // Maximum in grid A cell [47582,+49.8351,+135]
frac_a mbs: 0.0000000000000096 = 9.6e-15 // Mean absolute bias from 1.0
frac_a rms: 0.0000000000000167 = 1.7e-14 // RMS relative to 1.0
frac_a sdn: 0.0000000000000167 = 1.7e-14 // Standard deviation
...

The values of the frac_a metric are generally imperfect (not 1.0) for global grids. The bias is the deviation from the target metric shown in the second floating-point column in each row above (e.g., 8.9e-13). These biases should be vanishingly small with respect to unity. Mean biases as large as 1.0e-08 may be considered acceptable for off-line analyses (i.e., a single regridding of raw data) though the acceptable tolerance should be more stringent for on-line use such as in a coupler where forward and reverse mappings may be applied tens of thousands of times. The mean biases for such on-line regridding should be close to 1.0e-15 in order for tens-of-thousands of repetitions to still conserve mass/energy to full double-precision.

...

The map-checker computes the Consistency (frac_b) as row-sums of the weights stored in S and compares these to the stored values of frac_b. (Note how the definition of weights S(m,n) as the fractional contribution to destination gridcell m by source gridcell n makes calculation of frac_b almost trivial in comparison to frac_a). Nevertheless, frac_b in the file may differ from the computed row-sum for example if the map-file generator artificially limits the stored frac_b value for any cell to 1.0 for those row-sums that exceed 1.0. The map-checker raises an alarm when discrepancies between computed and stored frac_b exceed a tolerance (currently 5.0e-16). There are semi-valid reasons a map-generator might do this, so this does not necessarily indicate an error. The alarm simply informs the user that applying the weights will lead to a slightly different Consistency than indicated by the stored frac_b.

As with frac_a, the values of frac_b are generally imperfect (not 1.0) for global grids:

% ncks --chk_map map_ne30np4_to_cmip6_180x360_nco.20190601.nc
...
Consistency metrics (row-sums of weights) and errors---
Perfect metrics for global Grid A are avg = min = max = 1.0, mbs = rms = sdn = 0.0:
frac_b avg: 0.9999999999999999 = 1.0-1.1e-16 // Mean
frac_b min: 0.9999999999985523 = 1.0-1.4e-12 // Minimum in grid B cell [59446,+75.5,+45.5]
frac_b max: 1.0000000000004521 = 1.0+4.5e-13 // Maximum in grid B cell [63766,+87.5,+45.5]
frac_b mbs: 0.0000000000000065 = 6.5e-15 // Mean absolute bias from 1.0
frac_b rms: 0.0000000000000190 = 1.9e-14 // RMS relative to 1.0
frac_b sdn: 0.0000000000000190 = 1.9e-14 // Standard deviation
...

This example shows that frac_b has the greatest local errors at similar boundaries (multiples of 45 degrees longitude) as frac_a. It is typical for Conservation and Consistency to degrade in intricate areas of the intersection grid, and these areas occur at multiples of 45 degrees longitude for cubed-sphere mappings.

The map-checker will produce area-weighted metrics when invoked with the --area_wgt flag, e.g., ncks --area_wgt in.nc. Area-weighted statistics show the exact local and global results to expect with real-world grids in which large consistency/conservation errors in small gridcells may be less important than smaller errors in larger gridcells. Global-weighted mean statistics will of course differ from unweighted statistics, although the minimum and maximum do not change:

...

The examples above show no outstanding differences (besides rounding) between the unweighted and area-weighted statistics. The absence of degradation between the global unweighted statistics (further up the page) and the global weighted statistics (just above) demonstrates there are no important correlations between local weight biases and gridcell areas. The area-weighted mean frac_b statistic deserves special mention. Its value is the exact factor by which the mapping will shift the global mean of a spatially uniform input field. This metric is, therefore, first among equals when evaluating the quality of maps under consideration for use in time-stepping models where global conservation (e.g., of mass or energy) is crucial.

As of NCO version 4.9.2 (March, 2020), adding the --frac_b_nrm flag changes the map-checker into a read-write algorithm that first diagnoses the map-file statistics described above and then re-writes the weights (and weight-derived statistics frac_a and frac_b) to compensate or "fix" issues that poor-quality input grids can cause. Input grids can and often do have regions that are not tiled by any portion of any input gridcell. For example, many FV ocean grids (such as MPAS) are empty (have no gridcells) in land regions beyond the coasts. Some FV ocean grids have gridcells everywhere and mask (i.e., screen-out) the non-ocean gridcells by setting the mask value to zero. Both these designs are perfectly legal. What is illegal, yet sometimes encountered in practice, is overlapping gridcells on the same input grid. Such an input grid is said to be self-overlapping.

The surface topography dataset grid SCRIPgrid_1km-merge-10min_HYDRO1K-merge-nomask_c130402.nc (hereafter the HYDRO1K grid for short) used by E3SM and CESM is self-overlapping. Weight-generators that receive the same input location twice might (if they do not take precaustions to idenfity the issue, which no known weight-generators do) double-weight the self-overlapped region(s). In other words, self-overlapping input grids can lead weight-generators to produce values frac_b >> 1.0. Applying these weights would lead to exaggerated values on the destination grid.

The best solution to this issue is to adjust the input grid to avoid self-overlap. However, this solution may be difficult or impractical where the original data, producer, or algorithm are unavailable or unclear. In such cases, the --frac_b_nrm flag provides a workaround. Please understand that ncks --frac_b_nrm map.nc is designed to alter map.nc in-place, so backup the original file first.

...