#O05 Fast CVT grid generation for ocean modeling

Poster Title	Fast CVT grid generation for ocean modeling: a Lloyd-preconditioned L-BFGS method in parallel
Authors	Huanhuan Yang, Max Gunzburger (Unlicensed), Lili Ju (Unlicensed), Zhu Wang (Unlicensed)
Group	Ocean/Ice
Experiment
Poster Category	Future Direction
Submission Type	poster
Poster Link	https://www.dropbox.com/s/gittc6n9pup8mvf/ACME_scvt_v2.pdf?dl=0

Abstract

Centroidal Voronoi tessellation (CVT) has been a desired technique for creating high-quality Voronoi meshes and their dual Delaunay triangulations of given domains or manifolds, especially in ocean modeling applications, thanks to its ability of generating an evenly-spaced distribution of grid points. In recent years, the creation of such meshes in high-resolution, featuring variable resolution with smooth transition, is more demanding on efficiency. However, CVT algorithms in the literature have limitations either in the iterative solvers or in memory distribution during parallelization.

The Limited-memory BFGS (L-BFGS) iteration is a natural quasi-Newton method for speeding up CVT algorithms based on the classical Lloyd's iteration. To speed up further one can use a recently proposed graph Laplacian preconditioner, which can reduce the computational time to 56.9\% on average for certain non-uniform meshes. However, the graph Laplacian system is troublesome for ocean grids not only because of its limited efficiency in high resolution mesh but also its singularity on sphere. On the other hand, quasi-Newton methods as deterministic CVT algorithms call for the determination of multiple Delaunay triangulations, but available Delaunay softwares are mostly serial ones preventing efficient parallelization of CVT mesh generation.

In this work, we propose an efficient Lloyd-preconditioned L-BFGS method for CVT computation on sphere for ocean modeling, while the method is also applicable for general domain. Specifically, the Lloyd's step is iteratively updated and taken as the initial Hessian approximations. For quasi-uniform meshes, this preconditioned L-BFGS scheme performs similarly to the L-BFGS. For non-uniform meshes in our tests, however, the computation can be reduced to 6.1--34.2\% in terms of execution time and number of iterations accordingly. Different from previous works using L-BFGS in serial, we also employ a parallel framework for the Lloyd-preconditioned LBFGS using overlapping domain decomposition. The algorithm has partition deployed by an ultra-coarse CVT and it starts from parallelizing Delaunay triangulation. The parallelization algorithm features well-balanced loading of grid points and performs well in the strong scaling efficiency check, which shows almost linear efficiency when the number of grid points is in million scale.