Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

The purpose of this page is to document different strategies for plotting data on the spectral element cubed-sphere grid. The unstructured grid presents some challenges of it’s own, as a lot of common plotting utilities expect regular gridded data, but there are some additional concerns to plotting the spectral element data.

Overview

EAM is running on an unstructured finite element grid, with both spectral element data (“GLL data”: vertex data given on the GLL nodes) and “PG2 data”, finite volume physgrid” data, given as cell averages for each cell. See SE Atmosphere Grid Overview (EAM & CAM).

Regridding: Interpolate to lat/lon grid first. (both PG2 and GLL data): The most common approach is to interpolate the data to a FV latitude longitude grid, either “cap” or offset: CAM-FV Grid Overview . This allows one to use a wide variety of visualization software. Offline interpolation is quick and painless with NCO’s ncremap. Mapping files are collected on the E3SM inputdata server: https://web.lcrc.anl.gov/public/e3sm/mapping/maps/

GLL vertex data: dual grid cell fill. (ugly): This is the standard option in NCL and pyNGL for plotting unstructured data. NCL can internally compute a Delaunay triangulation of the data (via http://www.cs.cmu.edu/~quake/triangle.html ), where the data now lives on the vertex of each triangle. NCL then appears to form the Voronoi diagram (dual grid of the triangulation). On this dual grid, the data is now located at the center of each Voronoi cells. Alternatively, the user can read in a dual grid from a SCRIP file and specify the polygon containing each GLL node. With “RasterFill”, each of these dual grid cells will be colored based on the value of the original data set. This results are an accurate representation of all the data points in the original field, but can be quite ugly and is a very unnatural representation of the original spectral element data.

GLL vertex data: triangulation followed + Gourard shading (best): This approach is not available in NCL, but is available in matplotlib. In this approach, a Delaunay triangulation is first computed. Then each triangle is shaded using a linear interpolating based on its three vertex values. This produces nice looking results. It also has the advantage that it is monotone and min/max/oscillation preserving. All values in the original data set will be plotted. It’s perhaps the best visual representation of GLL vertex data.

PG2 data: triangularization + Voronio cell fill (unnatural): NCL’s Delaunay approach can also be used for PG2 data: computing a Delaunay triangulation and then the associated Voronoi grid. However, this is a lot of work to construct Voronoi cells when in the PG2 case, the data is already given as a cell average over quadrilaterals. For PG2, using these quadrilaterals instead of Voronio cells (next option below) is a more accurate representation of the discrete data.

PG2 data: Cell fill using FV quads. (best for PG2 data): NCL has a CellBounds option which allows the user to specify the FV cell bounds for cell-centered data such as PG2. This option, combined with “CellFill”, each PG2 quad will be colored based on the value of the data point. This will result in a blocky plot, that correctly conveys the cell average nature of each data point. All values in the original data set will be preserved. Matplotlib does not natively support this option, but see below for matplotlib code which can implement this approach.

Regridding

Perhaps the “easiest” way to plot data on the SE grid is to not plot it on the SE grid at all, but to regrid the data first to a regular latitude-longitude grid and use standard tools to plot. This is the approach that E3SM-Diags uses, and once the data is regridded it can be quite quick to plot, outperforming the rest of the options here. It’s also pretty easy to regrid on the fly using scipy.interpolate.griddata. For example, the follow will regrid SE data pretty quickly:

from scipy.interpolate import griddata
import numpy

xi = numpy.linspace(0, 360, 2 * 360)  # to regrid to 1/2 degree
yi = numpy.linspace(-90, 90, 2 * 180)  # to regrid to 1/2 degree
data_regridded = gridddata((x, y), data, (xi[None,:], yi[:,None]), method='linear')

This can then be plotted using the efficient pcolormesh matplotlib (mpl) function (or really any standard contouring/plotting/mapping functions), and the regridding itself is surprisingly fast. Still, we would like some means to visualize the data on the native grid, so we explore some other options below.

The algorithm above is probably using bilinear interpolation. This is acceptable, especially when upscaling. But for the best results:

For GLL data, the best algorithm to use (see Transition to TempestRemap for Atmosphere grids ) is TempestRemap’s “intbilin”, as this map is accurate, monotone, and can be used for both downscaling and upscaling. The only drawback is that it is not exactly conservative. If exact conservation is needed (so that the mass computed on the interpolation grid is the same as on the native grid), then TempestRemap’s less accurate “mono” should be used.

For PG2 data, to remap from FV (unstructured) to FV (latlon), we dont yet have an integrated bilinear option. ESMF’s “aave” option ( which can be obtained most efficiently using NCO’s ncremap) is conservative and good for downscaling, but will produce blocky plots when upscaling. For nicer looking plots when upscaling, ESMF’s “bilinear” is currently the only option.

Dual grid

pyNGL can construct this dual grid from a triangulation if the user specifies sfXArray and sfYArray. WE dont conver this option here, but instead cover the second option where the user reads a dual grid from the SCRIP file and specifies that via sfXCellBounds and sfYCellBounds. For PG2 data, this is the most natural approach, with the PG2 dual grid made up of uniform quadrilaterals given in the PG2 SCRIP file.

For GLL data, the dual grid approach is less natural, because the data on the Gauss-Lobatto-Legendre (GLL) grid exists at cell vertices (i.e., data is point values), while most existing analysis tools expect cell-centered data (i.e., data points represent values over some representative area bounded by each “cell”). We can sort of mimic this cell-centered view by constructing a new mesh that is a “finite volume dual grid” to the SE GLL grid, in which we construct bounding polygons around each GLL node and pretend that the data at the GLL node is representative of the value across the area of the polygon.

For the SCRIP approach, an example SCRIP file can be found in the inputdata repository for the ne4np4 grid here: https://web.lcrc.anl.gov/public/e3sm/mapping/grids/ne4np4-pentagons_c100308.nc

This grid file format contains the locations of each GLL node / cell center (grid_corner_lon, grid_corner_lat), and the vertices of the polygon constructed to represent the weight of each node. With this ancillary information, it is conceptually simple to plot the data on a map: draw the polygons for each GLL node, with color fill corresponding to the value at the GLL node consistent with a chosen colormap and scale. This appears to be what pyNGL or NCL’s contour_map function will do if the plot resources sfXCellBounds and sfYCellBounds are set to the grid_corner_lon and grid_corner_lat variables from the SCRIP-format dualgrid file. For example, we can plot the bottom-level water vapor from the ne4np4 initial condition file via the following:

#!/usr/bin/env python3

import plac
import ngl
import numpy
import xarray

datafile = '/global/cfs/cdirs/acme/inputdata/atm/cam/inic/homme/cami_mam3_Linoz_ne4np4_L72_c160909.nc'
gridfile = '/global/cfs/cdirs/acme/mapping/grids/ne4np4-pentagons_c100308.nc'
varname = 'Q'
  
# Read data
ds_data = xarray.open_dataset(datafile)
ds_grid = xarray.open_dataset(gridfile).rename({'grid_size': 'ncol'})
data = ds_data[varname].isel(time=0, lev=-1).squeeze()
lon_edges = ds_grid['grid_corner_lon']
lat_edges = ds_grid['grid_corner_lat']

# Setup the canvas
plot_format='png'
wks = ngl.open_wks(plot_format, './' + varname + '_markers')

# Make map plot
res = ngl.Resources()
res.cnFillOn = True
res.cnLinesOn = False
res.cnFillPalette = 'MPL_viridis'
res.cnFillMode = 'CellFill'
res.sfXCellBounds = lon_edges.values
res.sfYCellBounds = lat_edges.values
res.mpGeophysicalLineColor='white'
res.mpGridAndLimbOn = False
res.lbOrientation = 'horizontal'
res.lbTitleString = '%s (%s)'%(data.long_name, data.units),
res.nglFrame = False
map_plot = ngl.contour_map(wks, data.values, res)

# Add polymarkers for cell centers
polyres = ngl.Resources()
polyres.gsMarkerColor = 'black'
polyres.gsMarkerIndex = 1
poly_plot = ngl.polymarker(
    wks, map_plot,
    ds_grid['grid_center_lon'].values,
    ds_grid['grid_center_lat'].values,
    polyres
)
ngl.frame(wks)

# Close things
ngl.end()

This results in the following figure:

The black dots indicate the GLL nodes, and the color fill represents the values at each GLL node. It’s also possible to do this using Matplotlib, but we have to define our own function to draw the polygons:

#!/usr/bin/env python3

from matplotlib import pyplot
from matplotlib.path import Path
from matplotlib.patches import PathPatch
from matplotlib.collections import PatchCollection, PolyCollection
from cartopy import crs
import numpy
import xarray


def plot_unstructured(xv, yv, data, antialiased=False, **kwargs):
    """
    Plot unstructured data. xv and yv specify the x and y coordinates
    of the vertices of each cell and should have shape [ni,nv] where ni
    is the size of the grid (number of cells) and nv is the number of
    vertices for each cell. Data contains the values of the data you
    want to plot, and should have dimension [ni,]. The intent with this
    function is that you will need to read in an auxillary SCRIP-format
    file that describes the grid (unless this information is present in
    the same file that contains the data; unlikely) as well as the file
    that contains the data, and grab the cell corner information from
    the SCRIP file and the data from the data file. This function will
    then plot the data on the native grid by looping over each cell and
    drawing patches for each. Note that this will probably be really
    slow for big grids! Seems to work alright up to about ne120 or so,
    but really some more clever techniques should probably be used here
    (parallelism?).
    
    NOTE: To avoid artifacts due to antialiasing, you should probably pass
    antialiaseds=False to **kwargs.
    """

# Create array of cell vertices, indexed [npoints, ncorners, 2]
    corners = numpy.stack([xv, yv], axis=2)

    # Go back and fix corners where they wrap; we shouldn't have to do
    # this with cartopy, but it seems we do...
    for i in range(corners.shape[0]):
        if any(corners[i,:,0] < -90) and any(corners[i,:,0] > 90):
            corners[i,:,0] = numpy.where(corners[i,:,0] < -90, corners[i,:,0] + 360, corners[i,:,0])
        if any(corners[i,:,1] < -45) and any(corners[i,:,1] > 45):
            corners[i,:,1] = numpy.where(corners[i,:,1] < -45, corners[i,:,1] + 90, corners[i,:,1])

    # Create a PatchCollection from our aggregated list of PathPatches
    p = PolyCollection(corners, array=data, antialiaseds=antialiased, **kwargs)

    # Add the collection to the axes
    ax = pyplot.gca()
    ax.add_collection(p)

    # Set sane axes limits
    ax.set_xlim([xv.min(), xv.max()])
    ax.set_ylim([yv.min(), yv.max()])

    # Return collection of patches
    return p


# Function to convert longitude from 0-360 to -180 to 180
def fix_lon(lon):
    return numpy.where(lon > 180, lon - 360, lon)


# Plot example
datafile = '/global/cfs/cdirs/acme/inputdata/atm/cam/inic/homme/cami_mam3_Linoz_ne4np4_L72_c160909.nc'
gridfile = '/global/cfs/cdirs/acme/mapping/grids/ne4np4-pentagons_c100308.nc'
varname = 'Q'

# Read data
ds_data = xarray.open_dataset(datafile)
ds_grid = xarray.open_dataset(gridfile).rename({'grid_size': 'ncol'})
data = ds_data[varname]
lon_edges = ds_grid['grid_corner_lon']
lat_edges = ds_grid['grid_corner_lat']

# Reduce data if we need to
if 'time' in data.dims: data = data.isel(time=0).squeeze()
if 'lev' in data.dims: data = data.isel(lev=-1).squeeze()

# Make plot
figure, ax = pyplot.subplots(
    1, 1,
    #subplot_kw=dict(projection=crs.Orthographic(central_longitude=-100))
    subplot_kw=dict(projection=crs.PlateCarree())
)
pl = plot_unstructured(
    fix_lon(lon_edges.values), lat_edges.values, data.values,
    transform=crs.PlateCarree()
)
ax.set_global()
ax.coastlines(color='white', linewidth=0.5)

# Add colorbar to plot
cb = pyplot.colorbar(
    pl, orientation='horizontal',
    label='%s (%s)'%(data.long_name, data.units), pad=0.05
)

# Add markers to plot for GLL centers
pl = ax.plot(
    fix_lon(ds_grid['grid_center_lon'].values),
    ds_grid['grid_center_lat'].values,
    marker=',', markersize=0.5,
    MarkerFaceColor='black', MarkerEdgeColor='black',
    linestyle='none',
)

# Save plot
plotfile = '%s_example.png'%varname
figure.savefig(plotfile, bbox_inches='tight')

The resulting plot:

It may be possible to simplify this example, but so far this is the best solution I’ve come up with to do this in mpl. Regardless, both of these approaches have the downside of representing the GLL point data as cell-centered data, which is maybe not completely appropriate for data on the GLL grid (however, it probably is the best representation of data on the alternative pseudo-finite-volume physics grids, e.g, ne4pg2).

Triangulation

Another option is to use a Delaunay triangulation to plot the GLL data using triangles. Matplotlib will can calculate a Delaunay triangulation, and there are some convenience plot functions to do this automatically. Plotting the data then becomes much simpler:

#!/usr/bin/env python3
  
from matplotlib import pyplot
from cartopy import crs
import numpy
import xarray


# Function to convert longitude from 0-360 to -180 to 180
def fix_lon(lon):
        return numpy.where(lon > 180, lon - 360, lon)


datafile = '/global/cfs/cdirs/acme/inputdata/atm/cam/inic/homme/cami_mam3_Linoz_ne4np4_L72_c160909.nc'
gridfile = '/global/cfs/cdirs/acme/mapping/grids/ne4np4-pentagons_c100308.nc'
varname = 'Q'

# Read data
ds_data = xarray.open_dataset(datafile)
data = ds_data[varname]
lon = ds_data['lon']
lat = ds_data['lat']

# Reduce data if we need to
if 'time' in data.dims: data = data.isel(time=0).squeeze()
if 'lev' in data.dims: data = data.isel(lev=-1).squeeze()

# Make plot
figure, ax = pyplot.subplots(
    1, 1,
    #subplot_kw=dict(projection=crs.Orthographic(central_longitude=-100))
    subplot_kw=dict(projection=crs.PlateCarree())
)
pl = ax.tripcolor(
    fix_lon(lon.values), lat.values, data.values,
    transform=crs.PlateCarree()
)
ax.set_global()
ax.coastlines(color='white', linewidth=0.5)

# Add colorbar to plot
cb = pyplot.colorbar(
    pl, orientation='horizontal',
    label='%s (%s)'%(data.long_name, data.units), pad=0.05
)

# Add markers to plot for GLL centers
pl = ax.plot(
    fix_lon(lon.values),
    lat.values,
    marker=',', markersize=0.5,
    MarkerFaceColor='black', MarkerEdgeColor='black',
    linestyle='none',
)

# Save plot
plotfile = '%s_tri_example.png'%varname
figure.savefig(plotfile, bbox_inches='tight')

However, you can see that the resulting plot has represented the data by creating triangles between the GLL nodes:

Reading through the MPL documentation and source code, it appears that the default shading scheme “flat” is coloring each triangle based on the average value of the data at the three vertices that form each triangle. This is probably not what we want either, since ultimately we want to represent the GLL point data. However, there is an option to shade with a gradient fill that interpolates between each vertex. This is enabled by passing the shading=”gouraud” keyword argument to the plot function. With this change, we get:

Notice that the point at the top of South America still shows the local minimum that is evident in the cell-centered plots above, thus this method should still preserve gradients and oscillations in the data. The main downside of this method (triangulation) is that it becomes painfully slow for large grids. We may have a couple ways around this. First, we might expect that a significant amount of time is spent calculating the Delaunay triangulation. It turns out this is only partly true, as the following timing results show (using meshes ne4, ne30, ne120, ne240, ne512):

This shows the time to plot (or calculate a triangulation, in the case of the “triangulation” data) for the range of mesh sizes using three different methods: the dual grid approach above (mpl implementation), calling tripcolor without specifying a triangulation, and finally using tripcolor but first calculating the triangulation separately. Thus, it appears that a similar amount of time is spent in both the calculation of the triangulation and the plotting of the data (caveat: sample size of one for these timing figures, so huge uncertainty here; TODO: update with average timing over multiple passes). Still, we should be able to perform the triangulation once per grid, and save the results to use at plotting time. This should cut the actual plotting time in half, if the results above can be trusted. So the first option is to do this using the MPL routines, and just save the data. However, we already have something that looks something like this in the SEMapping.nc file that is output in each case run directory. This file contains information about the quadrilaterals that are formed by groups of four GLL nodes. Unfortunately, we don’t have a corresponding quadpcolor routine in MPL to plot data in quads directly. However, one can imagine splitting each quadrilateral in two triangles by connecting opposite vertices, and then creating a triangulation object from the resulting triangles. This is fairly easy to do in python:

# Read data to work with
datafile = '/global/cfs/cdirs/acme/inputdata/atm/cam/inic/homme/cami_mam3_Linoz_ne4np4_L72_c160909.nc'
scrip_file = '/global/cfs/cdirs/acme/mapping/grids/ne4np4-pentagons_c100308.nc'
latlon_file = '/global/cfs/cdirs/acme/mapping/grids/ne4np4_latlon_c100308.nc'
quad_file = '/global/cscratch1/sd/bhillma/acme_scratch/cori-knl/master.ne4_ne4.F-EAMv1-AQP1.test/run/SEMapping.nc'

datasets = {
    'data': xarray.open_dataset(datafile),
    'scrip': xarray.open_dataset(scrip_file),
    'latlon': xarray.open_dataset(latlon_file),
    'quad': xarray.open_dataset(quad_file),
}

# converts quad elements into tri elements
def quads_to_tris(quads):
    tris = numpy.zeros([2 * quads.shape[0],3])
    for i in range(quads.shape[0]):
        j = 2 * i
        tris[j,0] = quads[i,3] - 1
        tris[j,1] = quads[i,2] - 1
        tris[j,2] = quads[i,1] - 1
        tris[j + 1,0] = quads[i,1] - 1
        tris[j + 1,1] = quads[i,0] - 1
        tris[j + 1,2] = quads[i,3] - 1
    return tris
    
from matplotlib.tri import Triangulation
figure = pyplot.figure(figsize=(15, 10))
ax = pyplot.axes(projection=crs.PlateCarree())
quads = datasets['quad']['element_corners']
triangles = quads_to_tris(quads.transpose().values)
triangulation = Triangulation(fix_lon(x), y, triangles)
pl = ax.tripcolor(triangulation, data, shading='flat')
ax.set_global()
ax.coastlines(linewidth=0.2)
pl = ax.plot(fix_lon(x), y, 'k.', markersize=1)
figure.savefig('%s_garbage.png'%varname, bbox_inches='tight')

But so far, my attempts to use this to plot data that wraps around the dateline has resulted in garbage.

pyNGL has a similar capability, but fill styles seem to be a little more limited (or maybe just not as well documented). Setting sfXarray and sfYArray instead of sfXCellBounds and sfYCellBounds, and choosing a fill other than CellFill will force pyNGL to do a similar triangulation. The different fill options available are AreaFill and RasterFill. RasterFill with a low resolution grid produces a similar result to the dual grid methods above, with a solid fill around each GLL node:

There is an option to “smooth” using RasterFill by setting cnRasterSmoothingOn = True. From the NCL documentation:

If cnRasterSmoothingOn is set True, the level (and hence the color) assigned to each cell is determined by interpolating the values of neighboring points in the data grid. If cnRasterSmoothingOn is False, ContourPlot creates a discrete raster plot: any raster cell whose center lies within the rectangular area bounded by lines halfway between each grid point (in data space) is given the color assigned to the level representing the grid point datum.

By default, setting this to True gives the appearance of a contour plot rather than a smoothed raster plot:

We can get back to something closer to the “gouraud” fill that mpl uses by significantly increasing the number of “contour” levels by adjusting res.cnLevelSpacingF (and also res.lbLabelStride in order to make sure labels are not on top of each other). In this case, we’ve adjusted res.cnLevelSpacingF = 0.0001 and res.lbLabelStride = 10:

  • No labels