negative weights in cube_to_target

We (@Mark Taylor @Jishi Zhang) are looking at a corner case in cube_to_target. In short, the current version in E3SM repo will generate very large negative PHIS in some cases. The error is rare and tends to happen when the dx of the target grid is smaller/comparable to the base cube in my experience. So, it shouldn’t affect most users in most cases, but it would be great to fix it in the E3SM repo.

We seek help from Peter H. Lauritzen. He guided us the underlying reasons and pointed out the solution. As he suggested, I opened an issue and potential solutions in NCAP/Topo repo: https://github.com/NCAR/Topo/issues/62.

Description of the problem

In the experience of using the cube_to_target (mapping terr_height/PHIS from the base cube to target) for regionally refined meshes (RRM), we've encountered issues with large negative values for terr_target.
=> related to negative weights_all / weights_out / wt in cube_to_target.F90.

=> resulting from some dominant negative weights in remap.F90 (corresponding to the overlap area “omega_kl” between the target grid and the base cube grid in Lauritzen et al. 2015).

Lauritzen, P. H., Bacmeister, J. T., Callaghan, P. F., & Taylor, M. A. (2015). NCAR_Topo (v1. 0): NCAR global model topography generation software for unstructured grids. Geoscientific Model Development, 8(12), 3975-3986.

This generally happens when the dx of the target grid is comparable/smaller than the dx of the base cube:

  • cube3000 (3km) to a 800m California RRM

  • cube12000 (800m) to a 800m California RRM

  • cube12000 (800m) to a 100m California RRM

Temporary solutions / limiters

Initially, my stopgap solution to fix the negative PHIS was just to to write an NCL script to manually “correct” these large negative values by averaging the PHIS values of the surrounding grid points.

@mt5555 pointed out that we can add a simple non-negative limiter. Two options documented in the email:

  1. ignore the negative weights but use the other weights (renormalized so they sum to 1).  

  2. if there are negative weights, just replace all weights by 1/N  (where N is the number of source cells contributing). 

I’ve tested the two methods by modifying remap.F90 and they both worked well.

Underlying reasons and solution

Finally, @PeterHjortLauritzen helped us to identify the underlying reasons for the negative weights and suggested the solution. The discussion in email is documented here:

  • Turn on the ldbg flag in cube_to_target.F90

  • When we test the problematic point or target index, it prints out each weights of weights_out/omega_kl and writes out there debug files: side_integral.dat, inner_nonconvex.dat, inner_integral.dat

  • We can plot the line integral from these debug files to see which line segment caused the problem and what’s its geometric characteristics are (see below).

  • It turns out that the exact integration in the epsilon case (line segment parallel to latitude - compute weights exactly) caused the problem. It thinks the line segment is parallel to the latitude (judging by dy<fuzzy_width), but it's not (just because the segment is too short). see more details in the discussion pasted below.

  • My tests using the solution of deactivating the exact integration for cube12000 -> 800m/100m CARRM worked well.

Summary of solutions

  • get rid of the exact integration branches

  • add a simple non-negative limiter for weights


More discussion in the email:

===weights debug prints:

weights_out( 1 ,1), weights( 5 ,1) = 1.0509980834232285E-007 1.0509980834232285E-007
weights_out( 1 ,1), weights( 6 ,1) = -8.8581656182158517E-006 -8.9632654265581743E-006
weights_out( 1 ,1), weights( 14 ,1) = 7.0516063422007840E-011 8.8582361342792737E-006
weights_out( 2 ,1), weights( 1 ,1) = 8.9649366631406596E-007 8.9649366631406596E-007
weights_out( 2 ,1), weights( 9 ,1) = -1.4116530880303253E-005 -1.5013024546617319E-005
weights_out( 2 ,1), weights( 10 ,1) = -1.4568151528446161E-005 -4.5162064814290763E-007
weights_out( 2 ,1), weights( 11 ,1) = -1.5019816238143562E-005 -4.5166470969740055E-007
weights_out( 2 ,1), weights( 12 ,1) = 2.0893794756200756E-009 1.5021905617619182E-005
weights_out( 3 ,1), weights( 2 ,1) = 1.4125446408897337E-005 1.4125446408897337E-005
weights_out( 3 ,1), weights( 3 ,1) = 1.4576740853993954E-005 4.5129444509661786E-007
weights_out( 3 ,1), weights( 4 ,1) = 1.4922891465757215E-005 3.4615061176326182E-007
weights_out( 3 ,1), weights( 7 ,1) = 8.8666163256907899E-006 -6.0562751400664260E-006
weights_out( 3 ,1), weights( 8 ,1) = 8.8560315021176001E-006 -1.0584823573189794E-008
weights_out( 3 ,1), weights( 13 ,1) = -2.2046321616736293E-009 -8.8582361342792737E-006

 

@PeterHjortLauritzen ’s explanation and our tests for the calculation of weights:
===Reply 1 from PeterL

The algorithm is illustrated in Figures 3,4,5,6 in attached paper (Lauritzen et al. 2010) and equation 9 is the line-integral which should just end up being the area of the overlap. Please note that the line-integrals for lines parallel to y-axis (in gnomonic projection) are zero. 

Lauritzen, P. H., Nair, R. D., & Ullrich, P. A. (2010). A conservative semi-Lagrangian multi-tracer transport scheme (CSLAM) on the cubed-sphere grid. Journal of Computational Physics, 229(5), 1401-1424.

 

===Reply 2 from PeterL


0.The code is correctly identifying the line-segments ... but one of the line-segments is so short that the weights are likely not correctly calculated ...
1.Please note that area 1 is approximately 2E-9 and area 3 (the small area that is yellow in your plot) is 7E-11. So area one is about 30 times larger than area 1. Seems reasonable. Area 2, however, looks a little smaller than area 1 but the sum of the weights is negative. 
2.Note that the weights are of order 1E-6-ish so we are adding up a bunch of larger numbers to get a small number!
3.I carefully looked through all the line integrals and the algorithm captures all of them correctly but there is a very short line-segment near vertex 5 (see "problem" next to the arrow on my photograph - green lines are the source grid cell sides).
 
So what is the problem:
 
The problem is that vertex 5 is almost on top of a grid line. So when we start side integral 5 the algorithm includes a segment in the cell to the right of area (area 3) and computes weight
 zhang73 31: weights_out(           3 ,1), weights(           8 ,1) =   8.8560315021176001E-006  -1.0584823573189794E-008
 
that is added to area 3. Note that the segment is so short that the code thinks that the segment is parallel to the coordinate line which it is not: look for
 line segment parallel to latitude - compute weights exactly
 zhang73 22: weights(           8 ,1) =  -1.0584823573189794E-008
 
in the logfile. Since the area is on the order of 2E-9 adding a weight of -1E-8 drives the integral negative. If I omit the weight the area becomes 8.38E-9 which is way too large (but positive).
 
Could you try and run the code without using analytic integration of line-segments that the code thinks are horizontal?
In remamp.F90 change:
    else if (abs(yseg(1) -yseg(2))<fuzzy_width) then
to
    else if (.false.) then
 
… I am not sure this will solve the problem but I am pretty sure that if you manually remove vertex 5 (or displaced it) then you would not get a negative weight. This algorithm does not like vertices on top of grid lines (epsilon cases!).

 

===Results after deactivating the exact integration

I changed “abs(yseg(1) -yseg(2))<fuzzy_width” with “.false.” in two places, one calculating “slope” in subroutine <side_integral>, another calculating “weights” in subroutine <get_weights_gauss>.

The results look reasonble! The weights_out (ind=3) changed from -2.2046321616736293E-009 to 1.8986300128590624E-009, by changing the weights of S8 (please see figures below) from -1.0584823573189794E-008 to -6.4815613463673699E-009.

The new results seem to be reasonable, which seems to be proportional to the line intergal from the figure below:
the 1st weights is the yellow line integrals, 7.1e-11
the 2nd weights is the light blue line integrals, 2.1e-9
the 3rd weights is the dark blue line integrals, 1.9e-9

cube_to_target_debug.png

So, this line “else if (abs(yseg(1) -yseg(2))<fuzzy_width) then” in subroutine <side_integral> is unimportant as slope is not a return value.
 
The same line in <get_weights_gauss> takes effect. But seems that for the real cases of “line segment parallel to latitude”, the way that don’t “compute weights exactly” won’t affect much in my test—as shown in the inner line from point “Add 2” to “Add 3” (S13, S14), right? The weights(13) is -8.8582361343315627E-006 vs. -8.8582361342792737E-006 for the previous vs. modified remap.F90.
 
Have you seen any cases for “line segment parallel to latitude” that “compute weights exactly” and “do the general way—calculate slope and b” will give a notable different answers (and that’s why you have to keep this branch)?  

 

===Reply 3 from PeterL

My intention with computing line integrals using exact integration for segments parallel (within an epsilon) to the y-coordinate lines was that all the exact integrals should add up to the exact area of the source cubed-sphere grid. This is unimportant so I think we should get rid of the exact integration branches.