Title | : | Contention Aware Scheduling of Processes and Containers in a Multicore and Cloud Environment |
Speaker | : | Akshay Dhumal (IITM) |
Details | : | Fri, 12 Jul, 2019 11:00 AM @ AM Turing Hall |
Abstract: | : | The emergence of complex processor architecture has challenged the assumptions of scheduling algorithms. Completely Fair Scheduler (CFS) was incorporated as the default process scheduler in the Linux kernel since version 2.6.23. A closer look at CFS reveals that the scheduling objectives are in total conflict with the resource contention policies. The work studies the scheduling and load balancing aspects for Linux process and Docker containers in the multicore and cloud environment, respectively. Through an experimental study, it is observed that the shared resource contention is the primary reason behind the performance degradation of a job. The contention is often caused due to bad scheduling decision, as the algorithms rely heavily upon a few parameters for computing the schedule. With the ever-changing system architecture and increased complexity, multiple factors and metrics should be considered before making a placement decision.
GAS (GPU Assisted Scheduling for Multicore System) has been proposed for scheduling processes in a multicore environment. However, several issues such as the selection of appropriate hardware performance counters, optimization algorithms, limitations of the scheduling algorithm have not been previously studied. We identify that at a higher system utilization, balancing based on run-queue length is insufficient and multiple parameters such as retired instructions, cache misses, cache-references, LLC-prefetches, LLC- prefetch misses, etc should be considered based on workload behaviour. We analyzed the various optimization algorithms and found that the Genetic algorithm best suites our use-case. The work also considers profiling the process using appropriate hardware counters based on the workload behaviour. The experimentation of GAS on BOSS MOOL operating system shows that it improves the performance of jobs in terms of increased resource utilization and reduced shared resource contention. With GAS, we observed an average performance improvement of 24 %, when Spark jobs are co-scheduled with STREAM benchmark programs on a multicore system. We extend the idea of GAS to a distributed cloud environment where the unit of scheduling is containers. We propose C-Balancer, a scheduling framework for efficient placement of containers in the cloud environment. The framework periodically profiles the containers using multiple parameters and decides the optimal container to node placement. We describe the process of container migration in detail. We also propose two novel approaches for synchronizing the file system of containers. Any errors restraining the container file system synchronization would sabotage the process of container restoration on the target node and hence this step is critical for container migration. The experiments are conducted on the OpenStack cloud platform with 14 nodes, where each node is a virtual machine running BOSS MOOL operating system. Experimental evaluation in the cloud environment shows that our approach significantly reduces the variance in the resource utilization across the cluster by 60% on an average over the default Swarm scheduler, for various workload mix of Stress-NG and iPerf benchmark programs. The improved resource utilization translates to an improvement in job completion time by upto 58% for the workload mix. |