...
Short summary of what was done and what was the result.
Performance Test 1
Performance Test 1: short-desciption-of-testing-here
Date last modified:
Contributors: (add your name to this list if it does not appear)
Provenance: (Run provenance Link, Code Tag, etc:)
Results: (link to results, data and plots)
How was XXX be tested? i.e. how do we know when we have met requirement XXX. Will these unit tests be included in the ongoing going forward?
How was XXX be tested? i.e. how do we know when we have met requirement XXX. Will these unit tests be included in the ongoing going forward? Code Block |
---|
|
To look at how close physgrid gets to ideal speedup of 9/4 in the
physics computations, I did a 1-month run with -pecount S on cori-knl,
to provide a reasonable number of columns per core. Relevant
high-level timers are as follows:
ne30np4
"CPL:RUN_LOOP" 693 693 1.031184e+06 2.330065e+06 3365.724 ( 296 0) 3361.610 ( 544 0)
"CPL:OCNT_RUN" 693 693 1.030491e+06 6.395224e+02 2.122 ( 0 0) 0.901 ( 657 0)
"CPL:ICE_RUN" 693 693 1.031184e+06 5.078033e+03 9.215 ( 319 0) 5.647 ( 666 0)
"CPL:LND_RUN" 693 693 1.031184e+06 2.091669e+04 34.030 ( 282 0) 27.238 ( 669 0)
"CPL:ATM_RUN" 693 693 1.031184e+06 1.965188e+06 3226.064 ( 296 0) 2668.105 ( 420 0)
"a:CAM_run1" 693 693 1.031877e+06 1.307641e+06 2267.392 ( 296 0) 1713.941 ( 377 0)
"a:CAM_run2" 693 693 1.031877e+06 2.497877e+05 382.013 ( 296 0) 349.990 ( 151 0)
"a:CAM_run3" 693 693 1.031877e+06 3.823804e+05 567.488 ( 372 0) 526.157 ( 182 0)
"a:CAM_run4" 693 693 1.031877e+06 2.388912e+04 36.367 ( 0 0) 34.452 ( 401 0)
"a:UniquePoints" 693 693 1.031877e+06 2.399564e+03 4.179 ( 296 0) 2.732 ( 562 0)
"a:putUniquePoints" 693 693 1.031877e+06 5.007935e+03 8.052 ( 296 0) 6.124 ( 562 0)
ne30pg2
"CPL:RUN_LOOP" 693 693 1.031184e+06 1.197055e+06 1727.589 ( 145 0) 1727.089 ( 532 0)
"CPL:OCNT_RUN" 693 693 1.030491e+06 6.345620e+02 2.469 ( 0 0) 0.880 ( 518 0)
"CPL:ICE_RUN" 693 693 1.031184e+06 4.448382e+03 7.500 ( 523 0) 4.345 ( 648 0)
"CPL:LND_RUN" 693 693 1.031184e+06 1.419479e+04 23.461 ( 0 0) 18.509 ( 585 0)
"CPL:ATM_RUN" 693 693 1.031184e+06 1.119652e+06 1649.384 ( 26 0) 1541.688 ( 448 0)
"a:CAM_run1" 693 693 1.031877e+06 5.988779e+05 901.520 ( 692 0) 787.083 ( 370 0)
"a:CAM_run2" 693 693 1.031877e+06 1.315582e+05 193.712 ( 396 0) 185.811 ( 676 0)
"a:CAM_run3" 693 693 1.031877e+06 3.717056e+05 543.277 ( 369 0) 528.668 ( 545 0)
"a:CAM_run4" 693 693 1.031877e+06 1.642631e+04 25.760 ( 0 0) 23.682 ( 644 0)
"a:dyn_to_fv_phys" 693 693 1.031877e+06 8.753453e+03 12.869 ( 396 0) 12.479 ( 654 0)
"a:fv_phys_to_dyn" 693 693 1.031877e+06 2.451420e+04 40.740 ( 73 0) 32.149 ( 640 0)
^ timer sum
The speedups based on the timer sum column are as follows:
ideal speedup: (/ 9.0 4.0) 2.25
run1, before coupler: (/ 1.307641e+06 5.988779e+05) 2.1834851478072577
run2, after coupler: (/ 2.497877e+05 1.315582e+05) 1.8986859047934677
Thus, there's a little room for improvement in run2, but not much in run1.
The fv_phys vs UniquePoints timers show the cost of high-order remap. |
Code Block |
---|
|
Default -pecount on Cori-KNL, 1-month run.
name ranks call-count sum max min
ne30np4
CPL:RUN_LOOP 1350 2.008800e+06 2.241619e+06 1660.485 1660.455
CPL:OCNT_RUN 1200 1.784400e+06 1.201508e+03 4.542 0.968
CPL:ICE_RUN 1200 1.785600e+06 1.789948e+04 18.119 9.421
CPL:LND_RUN 1350 2.008800e+06 3.711972e+04 32.732 24.977
CPL:ATM_RUN 1350 2.008800e+06 2.046370e+06 1576.281 1496.246
a:CAM_run1 1350 2.010150e+06 1.299927e+06 1016.440 940.340
a:CAM_run2 1350 2.010150e+06 2.450479e+05 186.443 177.676
a:CAM_run3 1350 2.010150e+06 4.454110e+05 339.305 322.815
a:CAM_run4 1350 2.010150e+06 5.284136e+04 40.104 39.119
a:UniquePoints 1350 2.010150e+06 2.426363e+03 1.912 1.464
a:putUniquePoints 1350 2.010150e+06 4.627142e+03 3.993 2.843
ne30pg2
CPL:RUN_LOOP 1350 2.008800e+06 1.571943e+06 1164.625 1164.173
CPL:OCNT_RUN 1200 1.784400e+06 1.139701e+03 1.945 0.919
CPL:ICE_RUN 1200 1.785600e+06 7.574052e+03 7.080 5.905
CPL:LND_RUN 1350 2.008800e+06 2.970938e+04 24.901 19.429
CPL:ATM_RUN 1350 2.008800e+06 1.442209e+06 1083.946 1049.537
a:CAM_run1 1350 2.010150e+06 7.752476e+05 593.253 552.322
a:CAM_run2 1350 2.010150e+06 1.689466e+05 127.287 122.445
a:CAM_run3 1350 2.010150e+06 4.514902e+05 340.133 328.524
a:CAM_run4 1350 2.010150e+06 4.366816e+04 33.281 32.323
a:dyn_to_fv_phys 1350 2.010150e+06 1.895656e+04 14.433 13.922
a:fv_phys_to_dyn 1350 2.010150e+06 2.259800e+04 17.194 16.566
RUN_LOOP timer max: (/ 1660.5 1165) 1.4253218884120171 speedup
run1 timer sum: (/ 1.299927e+06 7.752476e+05) 1.6767894541047275 speedu |