Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

 


As part of the /wiki/spaces/CNCL/pages/25231511, the performance team must evaluate performance as part of several decision gates. This document (eventuallydescribes a formal process for meeting performance requirements during an ACME E3SM Code Review.

Intent

Performance evaluations take place at a number of points in the development cycle. Involving the performance team early in the development cycle permits preliminary consulting to minimize the impact of any model addition or improvement on the overall performance of ACMEE3SM. Changes made during the design will prevent the time and labor costs involved with significant refactoring of code late in the process. Later evaluations are required to determine the impact on the fully coupled system in the production configurations planned to ensure the simulation plan is not adversely affected.

...

  • supply enough details of the implementation to allow the performance team to catch potential problems, including
    • data structures and layout 
      • how does data layout fit within the existing model data infrastructure or 
      • describe any new data structures introduced
      • discuss any refactoring involved
    • code integration within current (and potential future) programming model
      • will the new code require new internode inter-node communication (e.g. reductions, halo updates)? 
      • does the new code sit in threaded or accelerated regions?
        • does it conform to existing code standards to prevent dependencies?
        • are loop orders and data layout vector-friendly?
    • I/O or storage requirements
      • does the new code add new fields for input, output or archiving?
  • provide a performance estimate as a percentage change in performance for a documented component configuration based on
    • early prototyping
    • previous implementations in other models
    • wild-ass guess with enough arm-waving to mesmerize (no problems - these are not the droids you're looking for)

...

  • Documented component level benchmark compared against previous/equivalent version without the new development
    • Does this comparison result in performance degradation (Might need a quantitative threshold here? Something high enough to be out of the noise, but low enough to trigger a look. A lot of little increases can cause death by a thousand cuts so might want to keep this fairly low)
    • Group leaders can often make the judgement call at this stage if performance impact is negligible.
    • Define standard benchmark and collect data on target machines.
  • Will code effect overall ACME E3SM performance?
    • Estimate based on component fraction in target production configurations
  • Do alternative algorithms and data structures need to be considered?
    • Quick code inspection to evaluate for showstoppers, particularly for developments in targeted performance regions.
  • Does the computational cost vs science improvement trade-off need to be weighed by the ACME E3SM Council?
    • If code performance exceeds threshold, for performance degradation, may elevate to Deep Dive and Council vote to evaluate whether science justifies expense

...

At this point, the development is being integrated in the full ACME E3SM coupled system and we will need to evaluate impacts on overall model performance and the resources allocated for the experimental plan. This will require involvement by the performance team directly.

...

 
  • data structures and layout 
    • how will data layout impact performance?
      • index ordering/strided data access
      • vectorization
  • Programming model
    • Is proposed programming model
    • internode in sync with exascale programming models approved at project level?
    • Internode (MPI-level): will the new code require new communication (e.g. reductions, halo updates)? 
    • does Does the new code sit in threaded or accelerated regions?
      • does it introduce dependencies that prevent threading, loop-level parallelism?
      • are loop orders and data layout vector-friendly?
  • I/O or storage requirements
    • does the new code add new fields for input, output or archiving?

 

...