Title | : | NP-hard Problems Meet Parallelization |
Speaker | : | Rajesh Pandian M (IITM) |
Details | : | Fri, 5 Jul, 2024 11:30 AM @ https://meet.google. |
Abstract: | : | After the advent of multiple processors, multiple cores, accelerators (such as GPU and FPGA), and programming platforms (such as CUDA, OpenMP and OpenACC), parallelization has gained momentum over the past few decades. Artificial intelligence and machine learning-based algorithms run faster largely because of the parallel computing frameworks and computing power of GPUs (graphics processing units). GPU Computing is the unsung hero behind the recent success of AI/ML. Mostly, GPU-based parallelization is used for high-performance computing in data centres and supercomputers. Many practical problems are NP-hard and are unlikely to have polynomial time algorithms. There is plenty of literature on polynomial-time approximation algorithms for such problems. Even some asymptotically efficient polynomial-time algorithms involve a lot of complex data structures, posing severe challenges during their efficient implementation. Therefore, there is a need for efficient sequential and parallel implementations. We have built scalable, efficient, parallel and benchmark implementations on multi-core CPU and GPU for two approximation algorithms of NP-hard problems: the Steiner Tree Problem (STP) and the Capacitated Vehicle Routing Problem (CVRP).
Our participation in the DIMACS and PACE challenges made us aware of the need for good algorithms that run well in practice, and parallelization would greatly benefit. We observe that NP-hard problems can offer unique challenges. In this thesis, we propose various mechanisms for taming these challenges in parallelization that are, in general, applicable to other approximation algorithms or graph problems. Current literature on the parallelization of algorithms for NP-hard problems lacks faster, effective, open-source, reproducible and scalable implementations (on large inputs) with solutions closer to the known optimum. In this regard, we have parallelized a two-approximation algorithm for Steiner Tree on GPU in a novel way. Similarly, we have designed and implemented a parallelization-friendly local-search algorithm for CVRP using randomization of different depth-first traversals of a minimum spanning tree. We have developed novel algorithmic optimizations (running multiple single-source-shortest-paths (SSSP) computations, SSSP halt) and parallelization-specific optimizations. Our implementations also encompass some of the existing optimizations followed in the literature, and we have made the code open-source to encourage reproducibility. On the largest benchmark instances, our code is 20x faster (in terms of running time) over the sequential-CPU Open Graph algorithms and Data structure (OGDF) KMB's two-approximation implementation (for Steiner Tree Problem) and 36x faster than the state-of-the-art GPU codes (for Capacitated Vehicle Routing Problem) producing better solutions than their baselines. Our parallelization techniques developed from this thesis can be, in general, applicable to other hard graph problems and their algorithms as well. Keywords: Parallelization; Approximation Algorithms; Graph Algorithms; Steiner Tree; Vehicle Routing; GPUs; CUDA; OpenMP; |