Summary of Budget Analysis in CESM/ACME The analysis focuses on water and heat. There is no analysis yet available for energy/momentum. Validating the coupled system budget is tricky for several reasons - Models are coupling fluxes but we are trying to conserve quantities. The appropriate areas have to be applied to any budget analysis. In addition, those areas are critical in the mapping and merging operations. Those areas are also associated with both statically variable fractions (like land cover on the atm grid or land grid) and time varying fractions (like sea ice cover on the ocean and atm grids or ocean cover on the atm grid). - The budget analysis ultimately comes down to trying to show the system conserves. In other words, the sum of the budget is somehow "zero". But the individual terms that make up that budget are generally fairly large and of varying signs. In other words, a 1% error in the computation of any one budget term in the diagnostics can have a huge impact on the overall budget. - Ulimately, the budget analysis over a coupling period, over a day, over a month, over a season, over a year, and over a multi-year period will be different. There should be diurnal, seasonal, and interannual variability in the system that can be diagnosed if the budgets are computed correctly. Care must also be taken when looking at a budget for any given time period to not read "too much" into that budget. - Because the model is discretized in time and coupling periods vary between components, there are inherent system lags and averaging. In other words, it will not generally be possible to balance the budget exactly over a coupling period. The error introduced from the lags should decrease from a budget analysis perspective as the budget averging period increases. But the implication is that there is no way to ever get "zero", and as a developer, there is always a question of whether the budget diagnostics represent the system accurately or whether there is still a bug in the budget diagnostics. It can be very difficult to tell the difference. - There are multiple areas in the system. Each component model has it's own area either explicitly or implicitly specified by the internal numerical discretization. There are also areas associated with the conservative mapping weights. Those weights are only as conservative as the area computation associated with the weights generation. The coupler uses the mapping areas as it's internal area values because those are generally computed using the same algorithm in an offline setting. The sum of the mapping areas should be very close to surface area of the earth. In CESM, the fluxes are area corrected as they are sent to/from the coupler. In other words, the fluxes are multiplied by the ratio of the model and coupler areas to convert from model areas to coupler areas. While this might seem non-conservative, it is actually required for conservation. This approach conserves quantity between the model and coupler for gridcell areas that do not agree exactly between the coupler and model. It allows for differences in areas in different models inherent in different model numerics. - Fluxes must be mapped conservatively in the coupler and the mapping weights must be conservative. These are generated offline and must be checked carefully for conservation. These will never be exactly conservative, unfortunately, due to small errors in the computation of the areas and the overlaps and other reasons. - It's not clear to how many digits the model should conserve over long periods. But it should be at least several (5 or more?). - Budgets are most valuable in a fully coupled system. In any configuration where a model is driven by data models, the overall budget is meaningless. However, in that case, the net fluxes into and out of an active component should be reasonable. - All coupling fields in the budget MUST BE accounted for correctly. As noted above, any small error in the diagnostics, any missing field, any incorrect sign, any units problem, any error in the applied area and fraction will wreak havoc on the overall budget. - The sign convention in the budget analysis is DIFFERENT than the sign convention in the coupler. The sign convention in the coupler has been positive downward where the models are considered in the following order, top to bottom; atm, lnd, rof, ice, ocn. The sign convention for the budgets is positive for that component to gain heat/water and negative for that component to lose heat/water. So the signs must be considered carefully in the analysis. - The state of the model is critical in understanding the budget diagnostics. The budget of a model that is spinning up is going to be very, very different than a model that is stable. Until models spin up, they will generally be sources or sinks of heat or water in the overall system from the coupler perspective and the budgets should show that. That net exchange of heat and water might be very different in the system once it's spun up. Ultimately, the net flux of heat and water in the coupler has to be consistent with the change in heat and water in all components. If a components is a big source or sink (over some medium term, like years, decades or centuries, depending on how the spin up goes), that is going to be reflected in the budget. - In a spun up and stable system, there should be no long term net heat or water flux from particular component. In practice, this will not be the case in a complex climate modeling system because there are always some lingering long term trends, but in theory, that defines a stable spun up state. There are really three parts of the budget that need to be understood. - First, at the model level. Each model should basically be consistent with D(Heat) /dt = sum(external_forcing) D(Water)/dt = sum(external_forcing) That means the model is conserving. Most models will have some internal lags that can make this analysis difficult. But ultimately, any internal lags, sources, or sinks should be understood well enough that every term in D(Heat) /dt = sum(external_forcing) + sinks + sources + lags D(Water)/dt = sum(external_forcing) + sinks + sources + lags should be identified, the equation should balance, sinks=sources=0 (hopefully) and lags are much smaller than other terms over the long time scale for every individual component. - Second, within the coupler. The coupler can only diagnose the net heat and water flux between components. This can be broken down in various terms and by component in a number of different ways. But ultimately, the coupler should also be conserving heat and water with only lags creating an inability to balance exactly. The coupler can diagnose the net heat and fresh water flux into any component. - Third, between the coupler and the model component. While this might seem trivial and should be trivial, it's worth checking that the net flux of heat or water into or out of any component as diagnosed by the coupler agrees with the sum(external_forcing) in every model. That verifies that the coupling between the coupler and the model is working correctly. For instance, the first and second parts above can be checked and everything can look fine even in a case where a model is reading external forcing from a file and not receiving it from the coupler. This third steps closes the budget and ensures the coupler and model are interacting correctly. Care must be taken to assess the quantity of heat and water, not the flux. And the appropriate areas must be applied on the coupler side and the model side (these might be different areas). The coupler can only address the second part of the budget above. That coupler budget analysis is carried out in cime/driver_cpl/driver/seq_diag_mct.F90 in subroutines public seq_diag_atm_mct public seq_diag_lnd_mct public seq_diag_rof_mct public seq_diag_glc_mct public seq_diag_ocn_mct public seq_diag_ice_mct There is a different subroutine to analyze the terms from each component. In practice, this is done to separate the analysis for each component to make each clearer. It also allows each diagnostic subroutine to be called at a different part of the run sequence, either to support extra concurrency for performance but also to allow each diagnostic subroutine to be called at the correct place in the run sequence where the coupling fields and dynamically varying fractions have consistent values for application. The ice fraction is updated at one particular location in the run sequence and which diagnostics are called before or after this fraction update is important. Also, in many of the interfaces, there is a "do_" logical flag that allows the interface to be called multiple times in the run sequence in order to compute different terms at different places in the model. In each seq_diag_* interface, the gridcell areas and fractions are available and all fields are multiplied by the appropriate area and fraction. Also, the first time in each subroutine, field index values are computed and stored to save time for character string look-ups in subsequent calls. As these subroutines are called, local data (by MPI task) is summed into a local array called budg_dataL. All data is stored locally until it's written either to the log file or to the restart file. While the diagnostics are broken up into several pieces by component, the budget really is a comprehensive set of diagnostics defined by the the datatypes real(r8),public :: budg_dataL(f_size,c_size,p_size) ! local sum, valid on all pes real(r8),public :: budg_dataG(f_size,c_size,p_size) ! global sum, valid only on root pe real(r8),public :: budg_ns (f_size,c_size,p_size) ! counter, valid only on root pe budg_dataL is where the locally accumulated budgets are stored for each component. budg_dataG is where the global budgets are summed before writing to the log or restart file. budg_ns is a counter that accumulates the number of times the budget is accumulated. Ultimately, the average (not accumluated) budgets are written to the log file and budget_ns is needed to average. The arrays are three dimensional f_size = the number of different fields that are accumulated independently, currently this is 17 and includes area, 9 heat terms, and 7 water terms. c_size = the number of components terms that are accumulated independently, currently this is 22 and includes inputs and outputs to each of the 6 components plus an additional pair that separates the northern and southern hemispheres of the ice model. That accounts for the first 14 terms. The other 8 terms are similar terms but computed on the atm grid. This is for a separate diagnostic in the model. That diagnostic looks at the budget solely on the atmosphere grid. p_size = the different time periods supported in the budget accumulation, currently this is 5 and the diagnostics support instantenous, daily, monthly, annual, and long term budgets separately. By leveraging a single three-dimensional array, all the budget information for any MPI task can be stored by field, component, and time in a single array. That array can then be quickly summed to generate the global diagnostics. These arrays are also written to the coupler restart files to support proper accumulations over different time periods even when the model is stopped and restarted. The budget data is written to restart in seq_rest_mod.F90. (NOTE: We should check that budg_dataG is being computed BEFORE the restarts are written in all cases. Looking at the code quickly, I'm not convinced it is. This would affect restartability of budgets. The issue is that before a restart is written, seq_diag_sum0_mct has to be called and that's only called by the seq_diag_print_mct routine and only under certain circumstances. I think this must be OK, but we should verify.) The budget diagnostics are written to the coupler log file in seq_diag_print_mct. There are 3 levels of budget output. plev>=1 is the standard net summary budgets by term and by component. plev>=2 provides diagnostics of the balance between each surface component and the atmosphere. plev>=3 details the atm, lnd, ocean, and ice budgets on the atm grid. plev=1 is really the most basic budget and the budget that should be the main focus. The budget diagnostics are controlled by a series of coupler namelist input. logical :: do_budgets ! do heat/water budgets diagnostics integer :: budget_inst ! instantaneous budget level integer :: budget_daily ! daily budget level integer :: budget_month ! monthly budget level integer :: budget_ann ! annual budget level integer :: budget_ltann ! long term budget level written at end of year integer :: budget_ltend ! long term budget level written at end of run do_budgets must to set to true to get any budgets. The other budget integer flags set the level of the budget diagnostics for that time period (associated with plev above). Budgets for different time periods are controlled, accumuluated, and written independently. ltann writes the long term budget at the end of each year. ltend writes the long term budget at the end of each run. The lt budgets are accumulated since the start of a given case. And all data is accumulated across a restart correctly (or should be). The budget diagnostics are not bit-for-bit reproducible on different MPI task counts due to the local and global accumulation phases, but this has not been a requirement. Output for budget_ann=1 should look something like this (this data is taken from an arbitrary CESM case and should not be treated as ideal, spun up, validated, scientifically correct, or otherwise. It's just an example set of data). (seq_diag_print_mct) NET AREA BUDGET (m2/m2): period = annual: date = 260101 0 atm lnd ocn ice nh ice sh *SUM* area -1.00000000 0.29174398 0.66380857 0.02269426 0.02175296 -0.00000023 (seq_diag_print_mct) NET HEAT BUDGET (W/m2): period = annual: date = 260101 0 atm lnd rof ocn ice nh ice sh glc *SUM* hfreeze 0.00000000 0.00000000 0.00000000 0.06461262 -0.02851287 -0.03609976 0.00000000 -0.00000000 hmelt 0.00000000 0.00000000 0.00000000 -0.90137816 0.37440068 0.52695615 0.00000000 -0.00002134 hnetsw -163.68069466 41.68913175 0.00000000 121.19128914 0.47355259 0.32910518 0.00000000 0.00238400 hlwdn -335.94243834 86.82047885 0.00000000 239.26125546 4.85790467 5.00257906 0.00000000 -0.00022029 hlwup 393.65982821 -107.77979589 0.00000000 -274.64148108 -5.58690599 -5.65160920 0.00000000 0.00003605 hlatvap 82.54401732 -9.75856097 0.00000000 -72.61188586 -0.04934321 -0.12437396 0.00000000 -0.00014668 hlatfus 0.85408604 -0.27936490 0.00000000 -0.40724960 -0.04611690 -0.12135644 0.00000000 -0.00000180 hiroff 0.00000000 0.05379623 -0.00000000 -0.05380324 0.00000000 0.00000000 0.00000000 -0.00000701 hsen 22.36323927 -10.68462583 0.00000000 -11.74410411 -0.00646233 0.07167567 0.00000000 -0.00027733 *SUM* -0.20196215 0.06105924 -0.00000000 0.15725517 -0.01148337 -0.00312330 0.00000000 0.00174560 (seq_diag_print_mct) NET WATER BUDGET (kg/m2s*1e6): period = annual: date = 260101 0 atm lnd rof ocn ice nh ice sh glc *SUM* wfreeze 0.00000000 0.00000000 0.00000000 -0.17130502 0.07559509 0.09570992 0.00000000 -0.00000000 wmelt 0.00000000 0.00000000 0.00000000 0.81914718 -0.26813490 -0.55091384 0.00000000 0.00009843 wrain -30.43448867 6.31746794 0.00000000 23.97606210 0.07191497 0.06909592 0.00000000 0.00005226 wsnow -2.55944273 0.83717382 0.00000000 1.22040634 0.13819869 0.36366928 0.00000000 0.00000540 wevap 32.99164098 -3.89730845 0.00000000 -29.03314109 -0.01750979 -0.04374031 0.00000000 -0.00005865 wrunoff 0.00000000 -3.15257628 -0.00197029 3.15567234 0.00000000 0.00000000 0.00000000 0.00112576 wfrzrof 0.00000000 -0.16121135 0.00000000 0.16123235 0.00000000 0.00000000 0.00000000 0.00002100 *SUM* -0.00229042 -0.05645433 -0.00197029 0.12807420 0.00006407 -0.06617902 0.00000000 0.00124421 What you are looking at are the area, heat, and water terms by component. The first section sums the areas in the system. These are coupler areas which are accumulated as the diagnostics are computed. These areas include the time varying fractions and represent an average area normalized by the surface area of the earth (m2/m2). All the rest of the budget diagnostics are normalized by the surface area of the earth as well before being written to generate W/m2 and km/m2s*1e6 units. If you think about the actual computation in the code, the coupler diagnostics are accumulating fluxes*areas. These are averaged and then finally divided by the area of the surface area of the earth to produce the table above. The division by area does not impact the results, it just puts the diagnostics into units that are more easily understood. Across any row, the SUM should be close to zero. It will not be zero due to lags. Each row demonstrates conservation of fluxes within the coupler. For instance, wrunoff shows mostly water passed from the land model, through runoff (which has very little net accumulation) and then into the ocean model and the net sum of all terms is .0011 vs 3.15 for any individual term. In the same way, wrain is 30.4 out of the atm, 6.3 into land, 23.97 into ocean, 0.07 into sea ice nh and sh for a net sum of 0.00005 (6-7 digits). Each column shows the total net accumluation of heat and water in any given component. So atm is losing net heat and water, but to several digits less than any specific term. The bottom right hand element (SUM x SUM) shows the overal budget of all terms and all components. Ideally, this would be zero (like any row should be) and isn't because of lags. All of this assumes that all coupling fluxes are in the appropriate units, W/m2 and kg/m2*s and that all coupling fluxes are part of the "flux" list, not the "state" list of coupling fields. This is critical as the model differentiates states and fluxes in the area correction application and mapping methods. Fluxes are always mapped conservatively. States never have the area corrections applied. In summary, there are many important details that need to be taken into account in the budget diagnostics and any small error will likely cause significant problems in the overall analysis. There is significant value to both having robust diagnostic capabilities in all models and the coupler that can easily be turned on to generate these diagnostics and in maintaining these diagnostics as the models evolve. Fixing these diagnostics after the fact tends to require a fairly heroic effort. Whenever a coupling field is changed or a new model or coupling field is introduced, the impact on the budget should be outlined ahead of time, modifications to the budget should be part of the implementation, and a budget analysis should be part of the validation process. Simply focusing on connecting the fields between models through the coupler is inadequate if there is a direct impact on the budget diagnostics.