Code Review Process - Performance Evaluation

As part of the /wiki/spaces/CNCL/pages/25231511, the performance team must evaluate performance as part of several decision gates. This document (eventually) describes a formal process for meeting performance requirements during an E3SM Code Review.

Intent

Performance evaluations take place at a number of points in the development cycle. Involving the performance team early in the development cycle permits preliminary consulting to minimize the impact of any model addition or improvement on the overall performance of E3SM. Changes made during the design will prevent the time and labor costs involved with significant refactoring of code late in the process. Later evaluations are required to determine the impact on the fully coupled system in the production configurations planned to ensure the simulation plan is not adversely affected.

Below are guidelines for developers to ensure the performance gates of the code review are met. - Do we want to turn this into a checklist or template?

Phase 1

Two performance evaluations are expected during Phase 1 of the review process.

Step 1.3 Performance Expectations

During the design phase, performance evaluation is required to both estimate the performance impacts of their development and to catch adverse design elements before the code is too entrenched.

As part of the required design document, the developer should:

supply enough details of the implementation to allow the performance team to catch potential problems, including
- data structures and layout
  - how does data layout fit within the existing model data infrastructure or
  - describe any new data structures introduced
  - discuss any refactoring involved
- code integration within current (and potential future) programming model
  - will the new code require new inter-node communication (e.g. reductions, halo updates)?
  - does the new code sit in threaded or accelerated regions?
    - does it conform to existing code standards to prevent dependencies?
    - are loop orders and data layout vector-friendly?
- I/O or storage requirements
  - does the new code add new fields for input, output or archiving?
provide a performance estimate as a percentage change in performance for a documented component configuration based on
- early prototyping
- previous implementations in other models
- wild-ass guess with enough arm-waving to mesmerize (no problems - these are not the droids you're looking for)

Note that we do not expect all developers to have requisite expertise, but encourage communication and consulting with the performance group to answer the above.

Step 2.3 Performance Screening - Group level

At this point, there should be an initial implementation of the new feature that is being evaluated at the Group/Component level.

Performance screening at this stage requires:

Documented component level benchmark compared against previous/equivalent version without the new development
- Does this comparison result in performance degradation (Might need a quantitative threshold here? Something high enough to be out of the noise, but low enough to trigger a look. A lot of little increases can cause death by a thousand cuts so might want to keep this fairly low)
- Group leaders can often make the judgement call at this stage if performance impact is negligible.
- Define standard benchmark and collect data on target machines.
Will code effect overall E3SM performance?
- Estimate based on component fraction in target production configurations
Do alternative algorithms and data structures need to be considered?
- Quick code inspection to evaluate for showstoppers, particularly for developments in targeted performance regions.
Does the computational cost vs science improvement trade-off need to be weighed by the E3SM Council?
- If code performance exceeds threshold, for performance degradation, may elevate to Deep Dive and Council vote to evaluate whether science justifies expense

Phase 2

At this point, the development is being integrated in the full E3SM coupled system and we will need to evaluate impacts on overall model performance and the resources allocated for the experimental plan. This will require involvement by the performance team directly.

Step 5 Performance evaluation

Performance team must assess performance in the targeted benchmark configuration where the feature is likely to be exercised.

There are at least two potential ways to evaluate.

If the new development/feature can be cleanly turned on/off and the off position has no impact on performance
- Pull request for new code/feature is satisfied (assuming it passes other code review requirements)
- Performance team is notified how to enable the new feature as part of one of the three standard benchmark configurations
- Performance team evaluates cost of new feature
- If new feature results in substantial additional expense (again, define substantial here)
  - Performance team investigates improvements
  - Proposing group looks for potential algorithmic improvements
  - If still too expensive - Council evaluates whether worth expense, using
    - scientific metrics for model improvement
    - impacts of performance degradation on experimental plan and resources available
if new feature is more invasive and not easily turned on/off

improvement is evaluated from a branch in a configuration as close as possible to standard benchmark configurations
evaluation follow as above

In addition to evaluating performance in production configurations for existing architectures, the performance team must evaluate performance impacts on future planned systems. As described in the Phase 1 evaluation, we will evaluate using criteria like:

data structures and layout
- how will data layout impact performance?
  - index ordering/strided data access
  - vectorization
Programming model
- Is proposed programming model in sync with exascale programming models approved at project level?
- Internode (MPI-level): will the new code require new communication (e.g. reductions, halo updates)?
- Does the new code sit in threaded or accelerated regions?
  - does it introduce dependencies that prevent threading, loop-level parallelism?
  - are loop orders and data layout vector-friendly?
I/O or storage requirements
- does the new code add new fields for input, output or archiving?

E3SM Documentation