Title | : | Usability and Developability Challenges in Advanced Analytics. |
Speaker | : | Arun Kumar (University of Wisconsin-Madison, USA) |
Details | : | Mon, 1 Sep, 2014 3:00 PM @ CS25 |
Abstract: | : | Advanced analytics is a booming area in the data management
industry. As enterprise applications of statistical and machine
learning techniques grow, many companies are racing to build analytics
systems that combine such techniques with data management. From
speaking to practitioners in diverse settings, we learned that two key
challenges to widespread adoption of advanced analytics are the ease
of usage and ease of development of analytics systems. In this talk, I
will present an overview of my research that aims to tackle some
usability and developability challenges in advanced analytics, while
also ensuring scalability and high performance. I will talk about
three systems that I have built as part of my research: (1) Staccato,
which aims to make it easier for database administrators to query
probabilistic OCR data, (2) Bismarck, which provides a unified
abstraction that aims to make it easier for software engineers to
integrate machine learning into an RDBMS, and (3) Columbus, which
provides a declarative language that aims to make the task of feature
selection easier for data analysts. Finally, I will talk briefly about
some recent results and open research problems in the larger space of
feature engineering, which is one of the most challenging bottlenecks
in applying machine learning to real-world data. Bio: Arun Kumar is a PhD candidate in Computer Sciences at the University of Wisconsin-Madison. He received his B. Tech. in CSE from IIT Madras in 2009. His primary research interests are in data management, especially its intersection with machine learning. He is co-advised by professors Jeffrey Naughton and Jignesh Patel. Previously, he worked with professor Christopher Re. Systems and ideas from his research have been released as part of the MADlib open-source library and have been shipped with products from Oracle, IBM, EMC, and Cloudera. A paper on feature selection co-authored by him received the Best Paper Award at SIGMOD 2014. Host: Prof. Krishna Sivalingam, CSE, skrishnam@iitm.ac.in |