v1 DECK known bugs - RRTMG-lookup

Bug report

From Shaocheng Xie on :

In helping Peter figure out what may be causing the high-resolution model to crash, we have been looking into all the bug fixes logged by the CESM team. We found one bug fix (of many) that we had not implemented to avoid letting the model to access values outside the valid dimension of a lookup table used in RRTMG that was not implemented in EAM V1. (We had implemented all the others that we think might be relevant). Based on the assessment made by Rich Neale in a 5-yr simulation with CAM5_3_69, the impact of the bug fix we are now considering was minor and the simulations look pretty near identical for the CAM climate. Since EAM V1 includes notable changes in physical parameterizations and increased vertical resolution and model top, the impact of the bug on EAM V1 might be different from that seen in CAM5. 


Phil and I asked Po-Lun and Balwinder to perform tests to understand the impact of the bug on EAM V1 and will report back to the group on what they will find shortly. At the same time, I would like to make both of you aware of the bug. Since it will take a while for us to fully understand the impact, I would suggest we continue the current DECK simulations and release the data as we planned.  In the future, if there is a need to re-run the simulations based on what we will find, we could just re-run the low-resolution simulations and re-submit the results to CMIP6. I think this is not uncommon for modeling centers that participate in CMIP. At that time, we can also get other bugs fixed, such as the bug recently found in the land model by NCAR, which may have potential impact on energy conservation.  


For the high-resolution model, Peter may want to implement the bug fix immediately and test if it helps address the instability problem. Since high resolution simulations are expensive, it may be better to include all known bugs before any serious runs, regardless of whether they help with the model crash issue or not. What do you think?


I will ask Po-Lun and Kai to provide more details about the bug in a separate email.

Description of the RRTMG Bugs (Reported by Po-Lun)

See ChangeLog CAM5_3_69 and CAM5_3_85, and Jira AG-290. Bug fix on github provided by Balwinder: singhbalwinder/cam/rrtmg-limiters

CAM5_3_69 (by bsander, santos, and cacraig)" A “bug fix for exponentials in rrtmg_sw_reftra" "to replace the lookup table by double precision exponentials". This is to prevent “the rare occurrences of extreme hot surface temperatures (exceeding 100 C)”, which is due to “the use of low precision LUT”. Performance slowdown is in the run-time noise and “Rich Neale who inspected the 5 year comparison runs says the simulations look pretty near identical”.

CAM5_3_85 (by cbardeen) “limits values in RRTMG vulnerable to bad table extrapolation”. The change to answer is “larger than roundoff, same climate”. 

Impact of the bugs (Reported by Po-Lun)

Po-Lun's 5-yr runs on Cori indicates that the impact on the overall climate is negligible. The only noticeable difference is that the bug fix produces a colder temperature near model top (above 70km, only seen in the WACCM plot). See AMWG diagnostics (flx01a: control simulation with v1b tuning; v1b_rrtmg_bugfix: with the rrtmg bug fix).

Balwinder's 5-yr runs on EOS gives the same conclusion that the impact of the bug on the overall climate is negligible.