The International Conference for High Performance Computing, Networking, Storage and Analysis
Improving the I/O Throughput for Data-Intensive Scientific Applications with Efficient Compression Mechanisms.
Authors: Dongfang Zhao (Illinois Institute of Technology), Jian Yin (Pacific Northwest National Laboratory), Ioan Raicu (Illinois Institute of Technology)
Abstract: Today's science is generating significantly larger volume of data than before, making many scientific applications bounded on I/O rather than computation. To reduce the I/O time, data compression is getting more attractive and practical. Most of existing compression techniques are general purpose, and not particularly crafted for scientific applications. In this context, we devise a new compression algorithm by carefully exploiting the characteristics of scientific data, namely the incremental changes to the data. Our proposed algorithm segments the data with different starting points and then store only these increments -- essentially achieving both computationally and space efficiency. We design and implement a real system with the proposed algorithm at the filesystem level, so that the compression and decompression is transparent to the end users, and does not need any modification to the applications. The system is evaluated on a 1024-core IBM BlueGene/P supercomputer with the 128-node GPFS file system.