The International Conference for High Performance Computing, Networking, Storage and Analysis
Scibox: Online Sharing of Scientific Data via the Cloud.
Authors: Jian Huang (Georgia Institute of Technology), Xuechen Zhang (Georgia Institute of Technology), Greg Eisenhauer (Georgia Institute of Technology), Karsten Schwan (Georgia Institute of Technology), Matthew Wolf (Georgia Institute of Technology), Stephane Ethier (Princeton Plasma Physics Laboratory), Scott Klasky (Oak Ridge National Laboratory)
Abstract: Collaborative science demands global sharing of scientific data. But it cannot leverage universally accessible cloud-based infrastructures like DropBox, as those offer limited interfaces and inadequate levels of access bandwidth. We present the Scibox cloud facility for online sharing scientific data. It uses standard cloud storage solutions, but offers a usage model in which high end codes can access the cloud via the same ADIOS APIs they already use for their I/O actions, thereby naturally coupling data generation with subsequent data analytics. Extending current ADIOS IO methods, with Scibox, data upload/download volumes are controlled via D(ata)R(reduction)-functions stated by end users and applied at the data source, before data is moved, with further gains in efficiency obtained by combining DR-functions to move exactly what is needed by current data consumers. Scibox is evaluated with science applications, e.g., GTS demonstrating the potential for ubiquitous data access with substantial reductions in network traffic.