GPU Programming Workshop

Overview

In this 4-days online workshop you will learn how to accelerate your applications with OpenACC, CUDA C++ and CUDA Python on NVIDIA GPUs.

The workshop combines lectures about Fundamentals of Accelerated Computing with OpenACC, CUDA C++ and Python on a single GPU with a lecture about Accelerating CUDA C++ Applications with Multiple GPUs.

The lectures are interleaved with many hands-on sessions using Jupyter Notebooks. The exercises will be done on a fully configured GPU-accelerated workstation in the cloud.

The workshop is co-organised by Leibniz Supercomputing Centre (LRZ), Erlangen National High Performance Computing Center (NHR@FAU) and NVIDIA Deep Learning Institute (DLI). NVIDIA DLI offers hands-on training for developers, data scientists, and researchers looking to solve challenging problems with deep learning.

All instructors are NVIDIA certified University Ambassadors.

Agenda

1st day: Fundamentals of Accelerated Computing with OpenACC

Learn the basics of OpenACC, a high-level programming language for programming on GPUs. This lecture is for anyone with some C/C++ of Fortran experience who is interested in accelerating the performance of their applications beyond the limits of CPU-only programming. In this lecture, you’ll learn:

  • How to profile and optimise your CPU-only applications to identify hot spots for acceleration
  • How to use OpenACC directives to GPU accelerate your codebase
  • How to optimise data movement between the CPU and GPU accelerator

Upon completion, you'll be ready to use OpenACC to GPU accelerate CPU-only applications.

 

2nd day: Fundamentals of Accelerated Computing with Modern CUDA C++

This lecture provides a comprehensive introduction to general-purpose GPU programming with CUDA. You'll learn how to write, compile, and run GPU-accelerated code, leverage CUDA core libraries to harness the power of massive parallelism provided by modern GPU accelerators, optimize memory migration between CPU and GPU, and implement your own algorithms. At the end of the lecture, you'll have access to additional resources to create your own GPU-accelerated applications.

 

3rd day: Fundamentals of Accelerated Computing with CUDA Python

This lecture explores how to use Numba — the just-in-time, type-specialising Python function compiler — to accelerate Python programs to run on massively parallel NVIDIA GPUs. You’ll learn how to:

  • Use Numba to compile CUDA kernels from NumPy universal functions (ufuncs)
  • Use Numba to create and launch custom CUDA kernels
  • Apply key GPU memory management techniques

Upon completion, you’ll be able to use Numba to compile and launch CUDA kernels to accelerate your Python applications on NVIDIA GPUs.

 

4th day: Accelerating CUDA C++ Applications with Multiple GPUs

Computationally intensive CUDA C++ applications in high-performance computing, data science, bioinformatics, and deep learning can be accelerated by using multiple GPUs, which can increase throughput and/or decrease your total runtime. When combined with the concurrent overlap of computation and memory transfers, computation can be scaled across multiple GPUs without increasing the cost of memory transfers. For organisations with multi-GPU servers, whether in the cloud or on NVIDIA DGX systems, these techniques enable you to achieve peak performance from GPU-accelerated applications. And it’s important to implement these single-node, multi-GPU techniques before scaling your applications across multiple nodes. 

This lecture covers how to write CUDA C++ applications that efficiently and correctly utilise all available GPUs in a single node, dramatically improving the performance of your applications and making the most cost-effective use of systems with multiple GPUs.

Important information

After you are accepted, please create an account under https://learn.nvidia.com/join

Ensure your laptop / PC will run smoothly by going to http://websocketstest.com/
Make sure that WebSockets work for you by seeing under Environment, WebSockets is supported and Data Receive, Send and Echo Test all check Yes under WebSockets (Port 80).
If there are issues with WebSockets, try updating your browser.

NVIDIA Deep Learning Institute

The NVIDIA Deep Learning Institute delivers hands-on training for developers, data scientists, and engineers. The program is designed to help you get started with training, optimising, and deploying neural networks to solve real-world problems across diverse industries such as self-driving cars, healthcare, online services, and robotics.

Prerequisites

Day 1:

  • Basic C/C++ or Fortran competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations.
  • No previous knowledge of GPU programming is assumed.

Day 2: 

  • Basic C++ competency, including familiarity with lambda expressions, loops, conditional statements, functions, standard algorithms and containers.
  • No previous knowledge of CUDA programming is assumed.

Day 3:

  • Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations.
  • NumPy competency, including the use of ndarrays and ufuncs.
  • No previous knowledge of CUDA programming is required.

Day 4:

  • Experience programming CUDA C/C++ applications, including the use of the nvcc compiler, kernel launches, grid-stride loops, host-to-device and device-to-host memory transfers, and CUDA error handling (covered on 2nd day).
  • Familiarity with the Linux command line.
  • Experience using makefiles to compile C/C++ code.

Hands-On

The lectures are interleaved with many hands-on sessions using Jupyter Notebooks. The exercises will be done on a fully configured GPU-accelerated workstation in the cloud.

Language

English

Lecturer

Dr. Momme Allalen (LRZ), Dr. Sebastian Kuckuk (NHR@FAU), Dr. Volker Weinberg (LRZ)

All instructors are NVIDIA certified University Ambassadors.

Prices and Eligibility

The course is open and free of charge for academic participants from the Member States of the European Union (EU) and Associated Countries to the Horizon 2020 programme.

Registration

Please register with your official e-mail address to prove your affiliation.

Withdrawal Policy

See Withdrawal

Legal Notices

For registration for LRZ courses and workshops we use the service edoobox from Etzensperger Informatik AG (www.edoobox.com). Etzensperger Informatik AG acts as processor and we have concluded a Data Processing Agreement with them.

See Legal Notices

   

 

Online Course GPU Programming Workshop
Number hdli1w25
Available places 49
Date 27.10.2025 – 30.10.2025
Price EUR 0.00
Location ONLINE


Room
Registration deadline 20.10.2025 23:59
E-mail [email protected]
No. Date Time Trainer Location Room Description
1 27.10.2025 09:00 – 16:00 Volker Weinberg ONLINE Lecture
2 28.10.2025 09:00 – 17:00 Momme Allalen ONLINE Lecture
3 29.10.2025 09:00 – 16:00 Sebastian Kuckuk ONLINE Lecture
4 30.10.2025 09:00 – 16:00 Momme Allalen ONLINE Lecture