SCHEDULE: NOV 16-22, 2013
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
Enabling Efficient Intra-Warp Communication for Fourier Transforms in a Many-Core Architecture
SESSION: ACM Student Research Competition Poster Reception
EVENT TYPE: ACM Student Research Competition Posters, ACM Student Research Competition
TIME: 5:15PM - 7:00PM
AUTHOR(S):Carlo del Mundo
ROOM:Mile High Pre-Function
Shuffle, a new mechanism in NVIDIA GPUs that allows for direct register-to-register data exchange within a warp, aims to reduce the shared memory footprint for data communication. Despite vendor claims on its efficacy, the mechanism is poorly understood with few works demonstrating performance improvement. Therefore, we seek to characterize the behavior of shuffle and provide insight into optimizing applications with intra-warp communication. We evaluated the efficacy of the shuffle mechanism in the context of matrix transpose as part of the communication stage in a 1D FFT code. Our study indicates that refactoring algorithms to fit the shuffle paradigm requires careful co-design between software and hardware. In particular, algorithmic decisions should avoid CUDA local memory allocation and usage at all costs. Overall, our optimized shuffle version accelerates matrix transpose by up to 44% with an overall application speedup of 1.17-fold for a 256-point FFT.
Carlo del Mundo - Virginia Tech