Title | : | Popularity probing and a model for massive datasets |
Speaker | : | Aiyer Anand Ravi (IITM) |
Details | : | Tue, 30 Jun, 2015 3:00 PM @ BSB 361 |
Abstract: | : | We study several problems that we collectively call 'popularity probing problems'. Our goal is to design algorithmic techniques for finding a highly ranked element (say, a
celebrity) in a dynamically evolving ordered set (of, say, popularity indices of people). These problems appear quite naturally in several data analytics problems where we are
interested in computing extent measures over dynamically evolving data.
We are given a set P = {p_1 , p_2 , . . . , p_n } of n points in Z that can change dynamically over discrete time steps. (For simplicity in notation, we simply use p_i to denote the position of the i th point at the “current” time step.) At each discrete time step, we typically allow one point, say p_i to make a move either to p_i + 1 or p_i − 1. We consider the case where this dynamism could either be adversarially controlled or stochastically controlled. Furthermore, the current position of the points is not readily available. They can only be accessed via probes typically of the form 'what is the current position of p_i ?' Our goal is to design algorithms that can return a highly ranked point among the dynamic set P using very few probes per time step. We present efficient algorithms for both forms of dynamism. Our algorithms run ad infinitum and outputs an index i_t at each time t. At any time t, the point p_i_t is guaranteed with constant probability to adhere to a suitably defined notion of a highly ranked point. Our first main result deals with the case where the dynamism is controlled by an oblivious adversary that is aware of our algorithm, but unaware of the random bits used by the algorithm. Under such an adversary, we present the Candidate Refresh Algorithm (CRA) that is guaranteed to output a point that is at least p ̄ − O(n^1/3 ), where p ̄ is the current position (in Z) of the point ranked n^1/3 in P. Our second algorithm, called the Single Candidate Algorithm (SCA), is designed for the case where at each point in time, a point p_i is chosen uniformly at random and it moves to a position that reports an element among the top Θ( log n / log log n ) elements with high probability in steady state for the random walk paradigm. The algorithm has been implemented for 1-D datasets. We believe our methods can be generalized for data in higher dimensions. |