Title | : | Resource Aware Scheduling in Data Intensive Distributed Processing Systems |
Speaker | : | Padala Srikant (IITM) |
Details | : | Tue, 30 May, 2017 2:00 PM @ BSB 361 |
Abstract: | : | The emergence of data-intensive distributed processing systems like Hadoop has enabled large scale data processing using simple programming models. These systems offer linear scalability through parallel processing of partitioned data. This work studies the impact of job and hardware resource characteristics on scheduling performance in such systems. Through experimental study on Graphlab and Hadoop, it is observed that page swapping and disk latency can play a significant role in the performance of jobs. However, these factors are not adequately considered in the existing solutions. Based on the experimental results, this work proposes a resource-aware multi-job scheduler for Graphlab, and a disk latency aware balancer and block placement strategy for Hadoop to improve job performance. This work experimentally shows that non-preemptive time sharing has better turnaround times and cumulative job completion time when compared to spatial resource sharing in Graphlab. Furthermore, it also shows that any Graphlab job (algorithm+dataset) has an optimal resource requirement and also asserts that under provisioning or over provisioning is detrimental to the job's performance. It proposes Octopus, a RAM-aware multi-job scheduler for Graphlab which uses non-preemptive time sharing of jobs to reduce the page swap frequency. For an average user, predicting the resources required by a job is not intuitive, and placing the onus of prediction onto the user is not desirable. Therefore, to lessen this overhead, this work proposes a prediction technique for Octopus, which tries to predict the optimal resources required by an incoming job using regression and hill climbing techniques.
Further, this work shows that read/write latency of disks leads to heterogeneity in a Hadoop cluster. The default balancer distributes the blocks evenly, thereby leading to suboptimal job performance. To improve the performance in such a scenario, this work proposes Tula, a disk latency-aware balancer and block placement strategy for Hadoop, which performs block reallocation in the cluster considering disk read latencies as well as disk space utilization. Disk latencies are obtained dynamically by the object oriented message filter framework of BOSS MOOL (Bharat Operating System Solutions with Minimalistic Object Oriented Linux kernel) operating system. Experimental evaluation in cluster environment reveals that Octopus and Tula provide significant performance improvements compared to existing solutions, and also the effectiveness of the message filter framework of BOSS MOOL operating system in a Hadoop cluster environment. |