Guidelines for rebaselining E3SM tests

The E3SM testing system includes several types of functional tests, such as testing if the restart capability is working.  It also has "comparison" tests, which compares simulation output with baseline output, where the baseline output was produced by a "blessed" version of the code.  If the new code produces output which is not BFB identical with the baseline output, the test will fail and be flagged as a DIFF.   There are many issues which can cause these failures, and this confluence page gives the guidelines for what is needed to accept the new code, and "rebaseline" our tests.  
  1. Test failures where we are confident the climate is unchanged
    1. Example: adding additional fields to the output or removing no longer needed fields, and the unchanged fields remain BFB with the baseline
    2. Example: removing tests, adding new tests, changing the resolution of existing tests
    3. integrator should follow the "rebaseline - obvious" protocal
  2. Roundoff level changes
    1. Many types of changing only effect and roundoff error levels:  loop ordering, compiler options, order of arithmetic.  These changes are not expected to change the climate, but because of the chaotic nature of the climate system, will cause all comparison tests to fail.  
    2. The developer should have a good reason to believe that their code changes are roundoff level only.  For example, they believe they have not changed the algorithm, only restructured it for reasons such as better performance or modularity.  
    3. If the effects are limited to a single component, the developer should follow procedures developed by the component group and have the results approved by the component group leads.   This will serve as a sanity check to make sure they did not introduce a bug by accident.   
    4. If the impact effects all components, such as changes in the coupler, the developer should follow procedures developed by the coupled simulation group and have the results approved by the coupled simulation group leads.      
    5. The integrator should follow the "rebasline - roundoff" protocol.  
      1. Example:  Recent changes in the atmosphere dycore, where the monotone limiter was rewritten for better vectorization,resulted in roundoff level differences.  Standalone dycore tests were used to verify that the changes were consistent with roundoff level changes:   The  L2 error in idealized test cases was not changed, and the plots of the fields looked identical.  Then two FC5 (AMIP w/cyclic year 2000 conditions) were performed, with the old and new code.  AMWG diagnostics were computed and the atmosphere group leads looked at the differences and determined, through expert judgement, that the differences were consistent with roundoff level changes.  
      2. Example:  the CIME merge.  CIME standalone tests suggests the coupler is working correctly and all code changes have resulted in at most roundoff level answer changes.  But because this will change the results of all E3SM simulations and effect all components, and we have no tests that verify that CIME is correctly interfaced to E3SM, the coupled group will develop a protocol to verify that the changes in the coupled model are consistent with roundoff level changes in the coupler.  An example could be to perform a B20TRC5 simulation with the old and new code and then use expert judgement to determine if the differences between these simulations are acceptable.  
      3. E3SM is developing more sophisticated tests that may  automate this process: such as growth tests and statistical ensemble tests.
  3. Bug fixes
    1. Bug fixes will often result in small changes in the simulated climate, and similar to roundoff level changes, they will cause all comparison tests to fail.  
    2. Minor bug fixes:  meaning no additional changes are needed
    3. Medium bug fixes:  Re-tuning will be necessary, by the climate is still expected to be acceptable
    4. Major bug fixes:   Large changes in the climate are expected and other processes may need to be modified.  
    5. The developer will get approval from the component group leads before issuing the pull request.  
    6. The integrator should follow the "rebasline - bugfix" protocol  
    7. For Major bugs:  this will impact all other component models and could make it so the E3SM master branch is not usable.   These changes should be approved by the council + group leads telecon.    
  4. New components
    1. Bringing in a new component, if done in a way which does not impact existing components, will not impact existing tests and thus rebaselining is not necessary.  
    2. A new component should include a new test, in which case the baseline tests do need to be augmented with the new results - the integrated should follow the  "rebaseline - obvious" protocol.  
  5. New features
    1. Bringing in a new feature that is disabled by default will not impact existing tests and rebaselining is not necessary
    2. If the feature is enabled be default, then it must have gone through the E3SM code review policy and the integrator should follow the "rebaseline - new feature" protocol.  



Rebaseline Protocols

  1. Rebase-Obvious:      
    1. The SE group will generate new baselines based on the integrator's request.  
  2. Rebase-Roundoff:     
    1. The developer will update:   Answer-changing commits  and make a note to indicate that the change is roundoff only and not expected to change the climate
    2. The integrator will confirm with the developer that the necessary tests to very roundoff level only changes have been performed, and that the component group leads have approved this request. (as described above in "2. Roundoff level changes")
    3. The SE group will verify that Answer-changing commits has been updated and then generate new baselines based on the integators request
  3. Rebase-Bugfix
    1. The developer will update:   Answer-changing commits  and make a note to indicate that the change is a bug fix and note if is is minor, medium or major.    
    2. The integrator will confirm with the developer  that the necessary group lead approvals have been obtained (as described above in "3. Bug fixes")
    3. The SE group will verify that Answer-changing commits has been updated, and then generate new baselines based on the integrators request.  For major bugs, the SE group will wait for the developer to notify E3SM-all before rebaselining.  
  4. Rebase-NewFeature
    1. The developer will update:   Answer-changing commits  including the link to the approved design document  Code Review Process Implementation
    2. The SE group will verify that Answer-changing commits has been updated, and then generate new baselines based on the integrators request.