The International Conference for High Performance Computing, Networking, Storage and Analysis
Highly Scalable Linear Time Estimation of Spectrograms - A Tool for Very Large Scale Data Analysis.
Authors: Onkar Bhardwaj (Rennselaer Polytechnic Institute), Yves Ineichen (IBM Research - Zurich), Costas Bekas (IBM Research - Zurich), Alessandro Curioni (IBM Research - Zurich)
Abstract: In many situations data analysis transform into eigenvalue problems. However, the era of big data means that dataset sizes render this problem practically intractable. The cubic complexity of dense methods and the limitation of iterative techniques to look deep into the interior of the spectrum at an acceptable cost, call for a new approach.
We present a close to linear cost method to estimate the spectrogram of a matrix, that is the density of eigenvalues in a certain unit of space. The spectrogram creates a compact graphical illustration of data matrices that can foster easier interpretation. This can be achieved by approximating the cdf of the eigenvalues and subsequently estimating the trace with help of a stochastic diagonal estimator.
We have designed and implemented a highly scalable implementation of our method, taking advantage of nested levels of parallelism that ultimately allow us to scale to massively parallel machines.