Progress on porting the Community Atmosphere Model - Spectral Element (CAM-SE) to the GPU-CPU hybrid architecture.

1.Poster Title	Progress on porting the Community Atmosphere Model - Spectral Element (CAM-SE) to the GPU-CPU hybrid architecture.
2.Authors	Matthew Norman, Irina Demeshko (Unlicensed)
3.Group	Performance
4.Experiment	Watercycle, BGC
5.Poster Category	Early Result
6.Submission Type	poster
7.Poster Link	GPU_poster_Norman_Demeshko.pdf

Abstract

In this poster we summarize our progress on porting CAM-SE model to Titan's GPU-CPU hybrid architecture. The existing CUDA port has been improved in many ways. First, thread block sizes were generalized so that they could be any multiple of 16. This improved kernel efficiency and allowed any number of vertical levels to be specified. Next, additional variables were placed in local GPU cache for improved efficiency for hyperviscosity routines. The packing and unpacking routines used to average boundaries for the SE method and to prepare data for MPI message passing were entirely revamped and improved for memory efficiency. In all, this resulted in over 4x improvement in runtime for some individual kernels, and roughly 30% runtime savings overall. More importantly, it allows more general science cases to be performed, which require many vertical levels.
Additional to our optimization work we continue porting the rest of the CAM-SE code to GPUs, such as advection limiters. Existing CUDA version contains only one, simplified limiter for the advection operator and it has been improved by adding a CUDA implementation for the "Optimal limiter". This provides or CUDA implementation with all necessary capabilities implemented in CAM-SE.