E3SM Code Review and New Feature Process
2022/4/8: Original
2022/6/17: Revised to stress importance of not bundling features in a single PR
2022/8/31: Revised to remove duplicate terminology section
2023/4/18: Revised to clarify that modifications to an existing stealth feature is considered a stealth feature.
2023/5/18: Clarify that breaking the addition of a single feature into multiple PRs is welcome
This document describes the policy for bringing in new features and changing existing code in the E3SM model.
(Implementing this new policy requires a significant amount of work improving: the E3SM diagnostics, CIME test suites and documentation. See collected https://acme-climate.atlassian.net/wiki/spaces/CNCL/pages/3315138566)
Terminology
Code changes are divided into 4 categories
BFB (bit-for-bit)
New code produces results which are bit-for-bit identical with old code
Does not contain stealth features
github PR label: BFB
Roundoff
New code would match old code in exact arithmetic, but will diverge exponentially fast when using finite precision
Climate of new and old code will converge as the averaging time goes to infinity
github PR label: non-BFB
Climate changing
New feature is brought into the model turned on
Other model configuration change (parameters, resolution, forcing data)
github PR label: CC
Stealth Feature
BFB code change that includes a new feature, turned off by default, or modifications to an existing stealth feature.
github PR labels Stealth
Note: multiple PR labels are possible. A Stealth feature is likely also BFB when the feature is off. CC, BFB and non-BFB are mutually exclusive.
Reference Solutions
The E3SM project will maintain reference solutions reflecting the current state of the ‘main’ branch
Coupled model: B-case runs (~100 years) for WC, cryosphere and BGC configurations
Component models: F, I and G cases, typically much shorter (e.g. 5 year F case with cyclic year 2010 forcing)
Updated periodically when climate changes are integrated, or monthly ( to check for unintended changes)
New features will be evaluated with respect to these reference solutions, with documentation on the import metrics
Note: E3SM Reference solutions and related documentation not yet ready!
Section 1: New Feature Request Process and Documentation Requirements
For new features or other major code changes, prepare a New Feature Overview Document.
(To determine when a code change becomes a major code change, consult with the component lead.)
E3SM-SFA funded work: feature should appear in that group’s roadmap before starting any work on a new feature including documentation prep. (E3SM pre-processing code (compsets, cime) and infrastructure are also covered by these procedures - infrastructure group is the “component group” for this work)
Externally funded work: developers are encouraged to submit this document early in their development process. This will assist with later E3SM acceptance.
Developer: Before writing code: Proposer completes the New Feature Overview Document. Include, as appropriate:
High-level description of code changes and/or new code, an overview of the design, infrastructure changes
Expected improvements and how these will be demonstrated
Describe needed updates to E3SM documentation
Expected impacts on computational performance and mass/energy budgets.
If relevant: describe papers that will be published
E3SM Staff: Before writing code: Reviewer within E3SM (component lead or their delegate) should:
Review document for completeness
Determine if there is sufficient benefit to E3SM to justify the E3SM integration and future maintenance costs.
For features not needed by the E3SM SFA, but which may be needed to support other DOE BER missions, the component lead should consult with E3SM leadership.
Determine if review by performance and infrastructure groups are needed and ensure they are done.
discourage and/or flag use of advanced language features or unnecessary complexity that may not be supported well by compilers or may cause performance degradation
Create a confluence page for discussion during the review process
Confluence space for all new feature overview documents and their approval status.
Developer: After new feature is approved, begin work in an E3SM fork, following Development Getting Started Guide
Features which are not approved but are needed for other reasons can be maintained by the developer on their E3SM fork.
Large new features should be developed through a series of PRs rather than a single massive PR (which would be hard to thoroughly test and scrutinize). One New Feature Overview Document can be used for all PRs associated with that feature.
Developer: When work is completed:
Revise overview document to show improvements as originally proposed in overview document, and document simulation results from new feature PR guidelines. Include links to results archiving system (once it is ready)
Submit PRs with links to documentation, follow new feature PR process.
Section 2: Collect documentation and simulation results needed for the Pull Request (PR)
Each of the four PR categories defined above require different levels of documentation and simulation results before a PR can be submitted.
BFB: no additional documentation is needed.
Roundoff. The developer must:
Provide evidence that the changes are roundoff.
e.g. subcomponent tests which have analytic solutions and error metrics. errors should change only at the roundoff level
e.g. subcomponent tests which do not have exponential growth of roundoff error: in standalone dycore tests roundoff level changes often result in only roundoff level differences in the output.
If roundoff-inducing changes can be isolated to a small amount of code, it might be possible to verify by inspection (e.g. changing order of operations)
Run nbfb tests. If any of them fail, additional scrutiny is warranted.
Strong evidence of roundoff level changes is sufficient in order to not needlessly delay our development process. In the rare case that the change is in fact not roundoff, this will be caught by our monthly reference B case update.
Stealth Features. The developer needs to run several tests (with the feature turned on if its a stealth feature) and note in the PR that they have:
Include link to New Feature Overview Document (Section 1)
Demonstrated that it’s possible to determine if feature is on/off via log file / namelists
Ensured that it’s covered by a suitable timer
Verified that it passes the “super-BFB” test suite.
Documented performance changes via “HPC” test suite with a link to PACE results (For features with potential performance impact)
Performed a component simulation test (when relevant). Provide E3SM diags output, comparison with baseline maintained by each component group, and determine if results in the acceptable range as documented by each component group. This will typically be shorter (e.g. a 5 year F compset) simulations used by component developers.
Performed a B compset, e.g. a 10 year run for a “sanity check”. Compare with appropriate “reference B-case” using E3SM diags, and determine if results are in the acceptable range as documented by the simulation teams.
Climate changing. The developer must perform:
Include link to New Feature Overview Document (Section 1)
All tests for Stealth Features, except that instead of a “10 year sanity check B compset”, a longer simulation reproducing the “reference B-case” needs to be performed and compared with the official reference B-case.
Section 3: Pull Request (PR) Review
New and stealth features: don't submit PR until Section 1 approval of overview document
Submit the PR to the E3SM github site following Development Getting Started Guide . Links to the material outlined in Section 2 above should be in the first comment after the PR description.
For code with potential performance impacts: Performance group lead or delegate reviews supplied PACE data.
It is important not to bundle multiple features or fixes into a single PR.
Climate Changing PRs (new features and bug fixes)
Simulation group lead approves based on climate changes
Simulation group lead (or delegate) will be assigned as a reviewer as well as appropriate component lead and then verifies:
source code review was completed
component lead also approves
PR includes test (or is covered by existing tests)
climate changing evaluation process (from Section 2) completed and results archived
Updates main branch reference B case as needed.
PR contains a single feature ( to ensure multiple features are tested independently)
Stealth features
Component lead (or delegate) is assigned as a reviewer and verifies:
source code review
PR includes appropriate test
stealth-feature evaluation process (from Section 2) completed and results archived
Expanded BFB tests were run with feature turned on
PR contains a single feature ( to ensure multiple features are tested independently)
Roundoff level PRs
Component lead (or delegate) verifies:
source code review was completed
evidence to support roundoff level tests is sufficient, if not, follow component new feature climate evaluation procedures
no stealth features
BFB
component lead (or delegate) be assigned as a reviewer verifies:
source code review was completed
no stealth features
Further Definitions
Super-BFB suite
The purpose of this suite is to make sure your new code maintains bit-for-bit properties we demand of the model. These properties include:
exact restart: the model should be able to restart during a run with bit-for-bit identical results compared to a model which ran for the same total time without restarting.
MPI-task count invariance: The model should give the same answers between two runs that only differed by the number of MPI tasks used.
OpenMP thread count invariance: The model should give the same answers between two run that only differ by number of OpenMP threads used.
There is a suite for each major component model ( atm
, ocn
, ice
, lnd
, rof
, and fully-coupled wcycl
). They cover the standand-resolution grids in both optimization and debug configurations.
See tests.py for details on the included tests.