W16 Physics Grid Performance Phase 1
This page should describe Performance Assessment Tests performed for this stand alone feature and should provide links to all the result pages.
Summary
Short summary of what was done and what was the result.
Performance Test 1
To look at how close physgrid gets to ideal speedup of 9/4 in the physics computations, I did a 1-month run with -pecount S on cori-knl, to provide a reasonable number of columns per core. Relevant high-level timers are as follows: ne30np4 "CPL:RUN_LOOP" 693 693 1.031184e+06 2.330065e+06 3365.724 ( 296 0) 3361.610 ( 544 0) "CPL:OCNT_RUN" 693 693 1.030491e+06 6.395224e+02 2.122 ( 0 0) 0.901 ( 657 0) "CPL:ICE_RUN" 693 693 1.031184e+06 5.078033e+03 9.215 ( 319 0) 5.647 ( 666 0) "CPL:LND_RUN" 693 693 1.031184e+06 2.091669e+04 34.030 ( 282 0) 27.238 ( 669 0) "CPL:ATM_RUN" 693 693 1.031184e+06 1.965188e+06 3226.064 ( 296 0) 2668.105 ( 420 0) "a:CAM_run1" 693 693 1.031877e+06 1.307641e+06 2267.392 ( 296 0) 1713.941 ( 377 0) "a:CAM_run2" 693 693 1.031877e+06 2.497877e+05 382.013 ( 296 0) 349.990 ( 151 0) "a:CAM_run3" 693 693 1.031877e+06 3.823804e+05 567.488 ( 372 0) 526.157 ( 182 0) "a:CAM_run4" 693 693 1.031877e+06 2.388912e+04 36.367 ( 0 0) 34.452 ( 401 0) "a:UniquePoints" 693 693 1.031877e+06 2.399564e+03 4.179 ( 296 0) 2.732 ( 562 0) "a:putUniquePoints" 693 693 1.031877e+06 5.007935e+03 8.052 ( 296 0) 6.124 ( 562 0) ne30pg2 "CPL:RUN_LOOP" 693 693 1.031184e+06 1.197055e+06 1727.589 ( 145 0) 1727.089 ( 532 0) "CPL:OCNT_RUN" 693 693 1.030491e+06 6.345620e+02 2.469 ( 0 0) 0.880 ( 518 0) "CPL:ICE_RUN" 693 693 1.031184e+06 4.448382e+03 7.500 ( 523 0) 4.345 ( 648 0) "CPL:LND_RUN" 693 693 1.031184e+06 1.419479e+04 23.461 ( 0 0) 18.509 ( 585 0) "CPL:ATM_RUN" 693 693 1.031184e+06 1.119652e+06 1649.384 ( 26 0) 1541.688 ( 448 0) "a:CAM_run1" 693 693 1.031877e+06 5.988779e+05 901.520 ( 692 0) 787.083 ( 370 0) "a:CAM_run2" 693 693 1.031877e+06 1.315582e+05 193.712 ( 396 0) 185.811 ( 676 0) "a:CAM_run3" 693 693 1.031877e+06 3.717056e+05 543.277 ( 369 0) 528.668 ( 545 0) "a:CAM_run4" 693 693 1.031877e+06 1.642631e+04 25.760 ( 0 0) 23.682 ( 644 0) "a:dyn_to_fv_phys" 693 693 1.031877e+06 8.753453e+03 12.869 ( 396 0) 12.479 ( 654 0) "a:fv_phys_to_dyn" 693 693 1.031877e+06 2.451420e+04 40.740 ( 73 0) 32.149 ( 640 0) ^ timer sum The speedups based on the timer sum column are as follows: ideal speedup: (/ 9.0 4.0) 2.25 run1, before coupler: (/ 1.307641e+06 5.988779e+05) 2.1834851478072577 run2, after coupler: (/ 2.497877e+05 1.315582e+05) 1.8986859047934677 Thus, there's a little room for improvement in run2, but not much in run1. The fv_phys vs UniquePoints timers show the cost of high-order remap.
Performance Test 2
Default -pecount on Cori-KNL, 1-month run. name ranks call-count sum max min ne30np4 CPL:RUN_LOOP 1350 2.008800e+06 2.241619e+06 1660.485 1660.455 CPL:OCNT_RUN 1200 1.784400e+06 1.201508e+03 4.542 0.968 CPL:ICE_RUN 1200 1.785600e+06 1.789948e+04 18.119 9.421 CPL:LND_RUN 1350 2.008800e+06 3.711972e+04 32.732 24.977 CPL:ATM_RUN 1350 2.008800e+06 2.046370e+06 1576.281 1496.246 a:CAM_run1 1350 2.010150e+06 1.299927e+06 1016.440 940.340 a:CAM_run2 1350 2.010150e+06 2.450479e+05 186.443 177.676 a:CAM_run3 1350 2.010150e+06 4.454110e+05 339.305 322.815 a:CAM_run4 1350 2.010150e+06 5.284136e+04 40.104 39.119 a:UniquePoints 1350 2.010150e+06 2.426363e+03 1.912 1.464 a:putUniquePoints 1350 2.010150e+06 4.627142e+03 3.993 2.843 ne30pg2 CPL:RUN_LOOP 1350 2.008800e+06 1.571943e+06 1164.625 1164.173 CPL:OCNT_RUN 1200 1.784400e+06 1.139701e+03 1.945 0.919 CPL:ICE_RUN 1200 1.785600e+06 7.574052e+03 7.080 5.905 CPL:LND_RUN 1350 2.008800e+06 2.970938e+04 24.901 19.429 CPL:ATM_RUN 1350 2.008800e+06 1.442209e+06 1083.946 1049.537 a:CAM_run1 1350 2.010150e+06 7.752476e+05 593.253 552.322 a:CAM_run2 1350 2.010150e+06 1.689466e+05 127.287 122.445 a:CAM_run3 1350 2.010150e+06 4.514902e+05 340.133 328.524 a:CAM_run4 1350 2.010150e+06 4.366816e+04 33.281 32.323 a:dyn_to_fv_phys 1350 2.010150e+06 1.895656e+04 14.433 13.922 a:fv_phys_to_dyn 1350 2.010150e+06 2.259800e+04 17.194 16.566 RUN_LOOP timer max: (/ 1660.5 1165) 1.4253218884120171 speedup run1 timer sum: (/ 1.299927e+06 7.752476e+05) 1.6767894541047275 speedu