Title | : | Scalable Data Analytics: The Role of Stratified Data Sharding |
Speaker | : | Srinivasan Parthasarathy (Ohio State University) |
Details | : | Tue, 14 Aug, 2018 4:00 PM @ A M Turing Hall (CSB |
Abstract: | : | With the increasing popularity of structured data stores, social networks and Web 2.0 and 3.0 applications, complex data formats, such as trees and graphs, are becoming ubiquitous. Managing and processing such large and complex data stores, on modern computational eco-systems, to realize actionable information efficiently, is daunting. In this talk, I will begin by discussing some of these challenges. Subsequently, I will discuss a critical element at the heart of this challenge relates to the sharding, placement, storage and access of such tera- and peta-scale data. In this work, we develop a novel distributed framework to ease the burden on the programmer and propose an agile and intelligent placement service layer as a flexible yet unified means to address this challenge. Central to our framework is the notion of stratification which seeks to initially group structurally (or semantically) similar entities into strata. Subsequently, strata are partitioned within this eco-system according to the needs of the application to maximize locality, balance load, minimize data skew or even take into account energy consumption. Results on several real-world applications validate the efficacy and efficiency of our approach. Notes: Joint work with Y. Wang (Airbnb) and A. Chakrabarti (MSR) Bio: Srinivasan Parthasarathy is a Professor of Computer Science and Engineering and the director of the data mining research laboratory at Ohio State. His research interests span data analytics, databases and high-performance computing. He is among a handful of researchers nationwide to have won both the Department of Energy and National Science Foundation Career awards. He and his students have won multiple best paper awards or best of nominations from leading forums in the field including: SIAM Data Mining, ACM SIGKDD, VLDB, ISMB, WWW, ICDM, and ACM Bioinformatics. He chairs the SIAM data mining conference steering committee (elected) and serves on the action board of ACM TKDD and ACM DMKD --leading journals in the field. Since 2012 he also helped lead the creation of OSU's first-of-a-kind nationwide (US) undergraduate major in data analytics and serves as one of its founding directors This is the 5th talk of the TCS-IIT Madras Computer Science and Engineering Colloquium Series. |