...
Short summary of what was done and what was the result.
Performance Test 1
Performance Test 1: short-desciption-of-testing-here
Date last modified:
Contributors: (add your name to this list if it does not appear)
Provenance: (Run provenance Link, Code Tag, etc:)
Results: (link to results, data and plots)
Code Block | ||
---|---|---|
| ||
To look at how close physgrid gets to ideal speedup of 9/4 in the
physics computations, I did a 1-month run with -pecount S on cori-knl,
to provide a reasonable number of columns per core. Relevant
high-level timers are as follows:
ne30np4
"CPL:RUN_LOOP" 693 693 1.031184e+06 2.330065e+06 3365.724 ( 296 0) 3361.610 ( 544 0)
"CPL:OCNT_RUN" 693 693 1.030491e+06 6.395224e+02 2.122 ( 0 0) 0.901 ( 657 0)
"CPL:ICE_RUN" 693 693 1.031184e+06 5.078033e+03 9.215 ( 319 0) 5.647 ( 666 0)
"CPL:LND_RUN" 693 693 1.031184e+06 2.091669e+04 34.030 ( 282 0) 27.238 ( 669 0)
"CPL:ATM_RUN" 693 693 1.031184e+06 1.965188e+06 3226.064 ( 296 0) 2668.105 ( 420 0)
"a:CAM_run1" 693 693 1.031877e+06 1.307641e+06 2267.392 ( 296 0) 1713.941 ( 377 0)
"a:CAM_run2" 693 693 1.031877e+06 2.497877e+05 382.013 ( 296 0) 349.990 ( 151 0)
"a:CAM_run3" 693 693 1.031877e+06 3.823804e+05 567.488 ( 372 0) 526.157 ( 182 0)
"a:CAM_run4" 693 693 1.031877e+06 2.388912e+04 36.367 ( 0 0) 34.452 ( 401 0)
"a:UniquePoints" 693 693 1.031877e+06 2.399564e+03 4.179 ( 296 0) 2.732 ( 562 0)
"a:putUniquePoints" 693 693 1.031877e+06 5.007935e+03 8.052 ( 296 0) 6.124 ( 562 0)
ne30pg2
"CPL:RUN_LOOP" 693 693 1.031184e+06 1.197055e+06 1727.589 ( 145 0) 1727.089 ( 532 0)
"CPL:OCNT_RUN" 693 693 1.030491e+06 6.345620e+02 2.469 ( 0 0) 0.880 ( 518 0)
"CPL:ICE_RUN" 693 693 1.031184e+06 4.448382e+03 7.500 ( 523 0) 4.345 ( 648 0)
"CPL:LND_RUN" 693 693 1.031184e+06 1.419479e+04 23.461 ( 0 0) 18.509 ( 585 0)
"CPL:ATM_RUN" 693 693 1.031184e+06 1.119652e+06 1649.384 ( 26 0) 1541.688 ( 448 0)
"a:CAM_run1" 693 693 1.031877e+06 5.988779e+05 901.520 ( 692 0) 787.083 ( 370 0)
"a:CAM_run2" 693 693 1.031877e+06 1.315582e+05 193.712 ( 396 0) 185.811 ( 676 0)
"a:CAM_run3" 693 693 1.031877e+06 3.717056e+05 543.277 ( 369 0) 528.668 ( 545 0)
"a:CAM_run4" 693 693 1.031877e+06 1.642631e+04 25.760 ( 0 0) 23.682 ( 644 0)
"a:dyn_to_fv_phys" 693 693 1.031877e+06 8.753453e+03 12.869 ( 396 0) 12.479 ( 654 0)
"a:fv_phys_to_dyn" 693 693 1.031877e+06 2.451420e+04 40.740 ( 73 0) 32.149 ( 640 0)
^ timer sum
The speedups based on the timer sum column are as follows:
ideal speedup: (/ 9.0 4.0) 2.25
run1, before coupler: (/ 1.307641e+06 5.988779e+05) 2.1834851478072577
run2, after coupler: (/ 2.497877e+05 1.315582e+05) 1.8986859047934677
Thus, there's a little room for improvement in run2, but not much in run1.
The fv_phys vs UniquePoints timers show the cost of high-order remap. |
Performance Test 2
Performance Test 2: short-desciption-of-testing-here
Date last modified:
Contributors: (add your name to this list if it does not appear)
Provenance: (Run provenance Link, Code Tag, etc:)
Results: (link to results, data and plots)
How was XXX be tested? i.e. how do we know when we have met requirement XXX. Will these unit tests be included in the ongoing going forward?