Understanding CUDA Performance

Michael Kenzel

German Research Center for Artificial Intelligence

SAT morning

In this tutorial, we will be looking into how the GPU compute mode programming model maps to the underlying hardware by example of CUDA on NVIDIA GPU architectures. After a quick introduction and recap of some of the basics of CUDA, we will dive into the more low-level details of how code actually executes on GPUs. Particular focus will be on the issue of branching as well as the memory hierarchy. Finally, we apply our new-found understanding to step-by-step take an initially naïve implementation of a parallel reduction all the way to a highly optimized one via multiple iterations of performance analysis and optimization.

What to prepare: bring a CUDA capable device, install CUDA toolkit, try out some basic tutorial or video guide

Michael is a researcher at the German Research Center for Artificial Intelligence. His research interests focus on the areas of GPU programming models, high-performance computing, and real-time graphics with numerous publications at reputable venues including Eurographics, SIGGRAPH, and SIGGRAPH Asia. He has been involved in teaching courses in the areas of GPU programming as well as computer graphics for many years at two different universities.

 

MartinUnderstanding CUDA Performance