2026 Apr 27 :: Unique Worker Model for CUDA

Speaker: Yuvraj Talukdar

Talk

Topic: Unique Worker Model for CUDA
Description: GPU programming portability remains hindered by a subtle but pervasive issue: CUDA kernels are routinely written with implicit assumptions about the number of threads per block (NTPB), a hardware-dependent parameter that varies across GPU generations, vendors, and even runtime resource availability. We present the Unique Worker (UW) Model for CUDA, a compiler-driven framework that decouples program logic from hardware-specific thread counts and guarantees semantic portability — correct execution on any GPU, regardless of the NTPB the hardware can feasibly provide at runtime — as a property complementary to, and independent of, performance portability. Our approach is realized as an AST-to-AST transformation in LLVM 20 and is, to our knowledge, the first compiler framework to correctly handle __syncthreads nested inside arbitrary if/while constructs in the context of CUDA thread coarsening. The model extends prior UW work on OpenMP to CUDA and to NVIDIA, AMD, and Apple GPU backends, enabling cross-vendor porting; we demonstrate this by automatically translating programs validated on Apple Metal to run on NVIDIA CUDA. We evaluate the implementation — roughly 12,000 lines of C++/LLVM code, released as open source — on eleven benchmarks across two NVIDIA platforms (Tesla V100 and RTX 3050 Mobile). The transformed programs execute correctly across all tested NTPB configurations, successfully run four benchmarks that previously crashed due to resource constraints, and achieve a geometric mean speedup of 95.8%–96.8% over a naive unoptimized translation. More details :: https://docs.google.com/document/d/1ObZZaZQiqY9Kz3bASkO9WTWyqLAKUH-D28k4Y2YtT-I/edit?tab=t.0
Time 5 pm to 6 pm (IST)
Presenter: Yuvraj Talukdar
About the Presenter: https://www.linkedin.com/in/pranav-ramesh-6ab8a7318/
URL : https://meet.google.com/bje-dtya-dyn