CGO 2012 Tutorial

SnuCL: An OpenCL Framework for Heterogeneous CPU/GPU Clusters


April 1, 2012

San Jose, California, USA


  1. Open Computing Language (OpenCL) is a programming model for heterogeneous parallel computing systems. OpenCL provides a common abstraction layer across different multicore architectures, such as CPUs, GPUs, DSPs, and Cell BE processors. Programmers can write an OpenCL application once and run it on any OpenCL-compliant system. However, current OpenCL is restricted to a single heterogeneous system. To target heterogenous CPU/GPU clusters, programmers must use the OpenCL framework combining with a communication library, such as MPI. The same thing is true for CUDA. This tutorial will cover usages and internals of an OpenCL framework, called SnuCL. It naturally extends the original OpenCL semantics to the heterogeneous cluster environment. The target cluster contains multiple CPUs and GPUs in a node. The nodes in the cluster are connected by an interconnection network, such as Gigabit and InfiniBand switches. For such clusters, SnuCL provides an illusion of a single heterogeneous system for the programmer. A GPU or a set of CPU cores  becomes an OpenCL compute device. SnuCL allows the application to utilize compute devices in a compute node as if they were in the host node. With SnuCL, OpenCL applications written for a single heterogeneous system with multiple OpenCL compute devices can run on the cluster without any modification. SnuCL achieves both high performance and ease of programming.

  2. In addition, we characterize the performance of an OpenCL implementation (SNU NPB suite) of the NAS Parallel Benchmark suite (NPB) on the target heterogeneous parallel platform. We believe that understanding the performance characteristics of conventional workloads, such as the NPB, with an emerging programming model (i.e., OpenCL) is important for developers and researchers to adopt the programming model.

  3. (the source code of SnuCL and the SNU NPB suite is available at the URL

Target audience:

  1. This tutorial is targeted for graduate students, researchers, and practitioners who are interested in heterogeneous parallel computing. It is designed to work either for those with prior OpenCL or CUDA programming experience or for those who are new to OpenCL.


  1. Jaejin Lee, Center for Manycore Programming, Seoul National University,

Outline of the contents:

  1. The first part of the tutorial consists of an introduction to OpenCL and addresses limitations of the current OpenCL programming model. Topics include:

  2. Heterogeneous computing and OpenCL

  3. Introduction to OpenCL

  4. How to write an OpenCL program?

  5. Limitations of the current OpenCL programming model

  6. The second part of the tutorial covers the SnuCL framework and its usage. Topics include:

  7. Achieving a single system image for the CPU/GPU cluster

  8. Buffer and consistency management

  9. SnuCL collective communication extensions to OpenCL

  10. Source-to-source kernel restructuring techniques

  11. How to write a SnuCL program?

  1. The third part of the tutorial covers the performance evaluation of SnuCL and its benchmark suite. Topics include:

  2. SnuCL benchmark applications (including the SNU NPB suite)

  3. Performance evaluation

  4. Scalability

  5. Future directions

© 2013 Center for Manycore Programming

Room 520, Building 301, Seoul National University, Seoul 151-744, Korea