SC13 Home > SC13 Schedule > SC13 Presentation - SIDR: Structure-Aware Intelligent Data Routing in Hadoop

SCHEDULE: NOV 16-22, 2013

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

SIDR: Structure-Aware Intelligent Data Routing in Hadoop

SESSION: Optimizing Data Movement

EVENT TYPE: Papers

TIME: 10:30AM - 11:00AM

SESSION CHAIR: Dhabaleswar K. (DK) Panda

AUTHOR(S):Joe Buck, Noah Watkins, Greg Levin, Adam Crume, Kleoni Ioannidou, Scott Brandt, Carlos Maltzahn, Neoklis Polyzotis, Aaron Torres

ROOM:205/207

ABSTRACT:
The MapReduce framework is being extended for domains quite different from the web applications for which it was designed, including the processing of big structured data, e.g., scientific and financial data. Previous work using MapReduce to process scientific data ignores existing structure when assigning intermediate data and scheduling tasks. In this paper, we present a method for incorporating knowledge of the structure of scientific data and executing query into the MapReduce communication model. Built in SciHadoop, a version of the Hadoop MapReduce framework for scientific data, SIDR intelligently partitions and routes intermediate data, allowing it to: remove Hadoops global barrier and execute Reduce tasks prior to all Map tasks completing; minimize intermediate key skew; and produce early, correct results. SIDR executes queries up to 2.5 times faster than Hadoop and 37% faster than SciHadoop; produces initial results with only 6% of the query completed; and produces dense, contiguous output.

Chair/Author Details:

Dhabaleswar K. (DK) Panda (Chair) - Ohio State University

Joe Buck - University of California, Santa Cruz

Noah Watkins - University of California, Santa Cruz

Greg Levin - University of California, Santa Cruz

Adam Crume - University of California, Santa Cruz

Kleoni Ioannidou - University of California, Santa Cruz

Scott Brandt - University of California, Santa Cruz

Carlos Maltzahn - University of California, Santa Cruz

Neoklis Polyzotis - University of California, Santa Cruz

Aaron Torres - Los Alamos National Laboratory

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

The full paper can be found in the ACM Digital Library