Title | : | Heterogeneity-aware Vertex-cut Based Distributed Graph Processing |
Speaker | : | Dinesh Kumar (IITM) |
Details | : | Tue, 16 Aug, 2016 2:00 PM @ BSB 361 |
Abstract: | : | With continuously growing data, clusters also need to grow periodically to accommodate the increased demand of data processing. This is usually done by addition of newer hardware, whose configuration might differ from the existing nodes. As a result, clusters are becoming heterogeneous in nature. For many real world machine learning and data mining applications, data is usually represented in the form of graphs. Most of the existing distributed graph processing frameworks such as Pregel and Graphlab assume that the computational nodes are homogeneous. These frameworks split the graph into approximately equal subgraphs, which is appropriate for homogeneous clusters. In heterogeneous clusters, these frameworks perform poorly in most of the scenarios. For example, during the execution of a big job, use of swap space by a node with low memory can drastically increase its execution time since memory is a severe bottleneck in in-memory systems.
To address these issues, GraphIVE (Graph Processing In Varied Environments) has been proposed. It is a capability-aware graph partitioning policy for Graphlab applications. It partitions the graph based on capabilities of nodes. The capabilities of nodes are determined based on the nodes performance in previous jobs and denoted by a weight vector. It continuously tries to reach optimum performance by optimizing the weight vector via hill climbing based algorithm. Moreover, GraphIVE reduces the communication overhead by reducing the replication factor of vertices. Experimental results show that GraphIVE significantly improves the execution time of jobs. In case of PageRank algorithm, a speedup of upto 20x is observed. However, it may not perform well if a new job differs drastically in terms of resource requirements when compared to previous jobs executed on the cluster. To overcome this limitation, a dynamic graph re-partitioning policy, GraphSteal has been proposed. GraphSteal dynamically re-partitions the graph based on the runtime characteristics of the job. To avoid computational skew in the cluster, it migrates edges from slower nodes to faster nodes. To demonstrate our approach, the source code of Graphlab has been modified to incorporate dynamic graph re-partitioning strategy. Experimental results show that GraphSteal significantly improves the performance over Graphlab. In case of PageRank and connected components algorithm, the execution time is reduced by upto 40%. We also compare and contrast GraphIVE and GraphSteal in different scenarios. |