Research Interests

Research Activities






  1. Jungho Park, Wookeun Jung, Gangwon Jo, Ilkoo Lee, and Jaejin Lee. PIPSEA: A Practical IPsec Gateway on Embedded APUs, In Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS), 2016.
  2. Jaejin Lee, Gangwon Jo, Wookeun Jung, Hongjune Kim, Junghyun Kim, Yong-Jun Lee, and Jungho Park. SnuCL: A unified OpenCL framework for heterogeneous clusters, Advances in GPU Research and Practice, pp. 23 - 56, Morgan Kaufmann, 2016.
  3. Junghyun Kim, Gangwon Jo, Jaehoon Jung, Jungwon Kim, and Jaejin Lee. A Distributed OpenCL Framework using Redundant Computation and Data Replication, In Proceedings of the 37th Annual ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2016.
  4. Gangwon Jo, Jeongho Nah, Jun Lee, Jungwon Kim, and Jaejin Lee. Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 7, pp. 1814 - 1825, 2015. (Link)
  5. Gangwon Jo, Won Jong Jeon, Wookeun Jung, Gordon Taft, and Jaejin Lee. OpenCL Framework for ARM Processors with NEON Support, In Proceedings of the 2014 Workshop on Programming Models for SIMD/Vector Processing (WPMVP), 2014. (Link)
  6. Sangmin Seo, Jun Lee, Gangwon Jo, and Jaejin Lee. Automatic OpenCL Work-Group Size Selection for Multicore CPUs, In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), 2013.
  7. Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, and Jaejin Lee. SnuCL: an OpenCL Framework for Heterogeneous CPU/GPU Clusters, In Proceedings of the 26th International Conference on Supercomputing (ICS), 2012.
  8. Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, and Jaejin Lee. OpenCL as a Unified Programming Model for Heterogeneous CPU/GPU Clusters, Poster presentation in Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2012.
  9. Sangmin Seo, Gangwon Jo, and Jaejin Lee. Performance Characterization of the NAS Parallel Benchmarks in OpenCL, In Proceedings of the 2011 IEEE International Symposium on Workload Characterization (IISWC), 2011.
  10. Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, and Jaejin Lee. OpenCL as a Programming Model for GPU Clusters, In Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing (LCPC), 2011.


  1. Wookeun Jung, Jungwook Kim, Thanh Tuan Dao, Jungho Park, Jiyoung Park, Jaeho Shin, Jaehoon Jung, Gangwon Jo, Heehoon Kim, Hyoungwook Nam, and Jaejin Lee. Hardware and Software Support for Deep Learning, Communications of KIISE, vol. 34, no. 9, pp. 10 - 20, 2016.
  2. Jaeho Shin, Gangwon Jo, Ilkoo Lee, and Jaejin Lee. Automatic Optimization Methods for Image Processing Programs Using OpenCL, Korea Computer Congress 2016 (KCC), 2016.
  3. Gangwon Jo, Junghyun Kim, Thanh Tuan Dao, Wookeun Jung, Jungho Park, Yong-Jun Lee, Jaehoon Jung, Jaeho Shin, and Jaejin Lee. HPC Technology Trends of Big Data Analyses with Supercomputers, Communications of KIISE, vol. 34, no. 2, pp. 31 - 42, 2016.
  4. Jaehoon Jung, Gangwon Jo, Junghyun Kim, Wookeun Jung, and Jaejin Lee. LRC: A Lightweight Communication Library for High Performance Computing, Korea Computer Congress 2015 (KCC), 2015.
  5. Junghyun Kim, Jungho Park, Gangwon Jo, Thanh Tuan Dao, Jinyoung Joo, Jaehoon Jung, Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, and Jaejin Lee. SnuCL: OpenCL Programming Environment for Heterogeneous Manycore Clusters, Communications of KIISE, vol. 32, no. 5, pp. 66 - 76, 2014.
  6. Gangwon Jo, Sangmin Seo, Jeongho Nah, Jungwon Kim, Junghyun Kim, Jun Lee, Jungho Park, Yong-Jun Lee, Hongjune Kim, Sooyeon Kang, Jinyoung Joo, Seonmyeong Park, Wookeun Jung, Kihyun Im, and Jaejin Lee. Trends on Heterogeneous Supercomputers and a Case Study on the Development of a Supercomputer Chundoong, Communications of KIISE, vol. 31, no. 4, pp. 34 - 41, 2013.
  7. Jeongho Nah, Gangwon Jo, Sooyeon Kang, Wookeun Jung, and Jaejin Lee. Design and Implementation of Virtual Machines as an Aid in Teaching Computer Concepts, Korea Computer Congress 2012 (KCC), 2012.
  8. Jun Lee, Sangmin Seo, Jungwon Kim, Gangwon Jo, Wan Choi, and Jaejin Lee. Current Status and Development Prospects of High Performance Computing Technology, The Journal of Korean Institute of Next Generation Computing, Vol. 8, No. 2, 2012.
  9. Jeongho Nah, Honggyu Kim, Hongjune Kim, Gangwon Jo, and Jaejin Lee. Implementation of Register Allocator for JavaScript JIT Compiler, 2011 KIISE Fall Conference, 2011.
  10. Hongjune Kim, Joo Hwan Lee, Gangwon Jo, and Jaejin Lee. Measuring JavaScript Performance with a Real World Web Application, 2011 KIISE Fall Conference, 2011.
  11. Gangwon Jo, Hongjune Kim, Joo Hwan Lee, Jeongho Nah, and Jaejin Lee. Alias Analysis for JavaScript Program Optimization, Korea Computer Congress 2011 (KCC), 2011.


Honors and Awards



SnuCL is an open-source OpenCL framework for heterogeneous clusters. It allows OpenCL applications to utilize all compute devices in a cluster as if they were in a single node. It also integrates multiple OpenCL platforms from dirent vendor implementations into a single platform (e.g., Intel + NVIDIA, Intel + AMD, etc.).


Anyone can build a fast CPU. The trick is to build a fast system. - Seymour Cray


SnuCore is a 16-node experimental CPU/GPU cluster built in November 2011. Each node of SnuCore contains two 12-core AMD Opteron 6172 CPUs and three AMD Radeon HD 6990 graphics cards (i.e., 6 GPUs). We optimized HPL (High Performance Linpack) for multi-GPU nodes and have achieved 15.9 TFLOPS (991 GFLOPS per node).


Chundoong (Korean: 천둥; IPA: [cʰən.duŋ]) is a self-made 56-node heterogeneous supercomputer built in October 2012. The word 'chundoong' means thunder in Korean. Each node of Chundoong contains two 8-core Intel Xeon E5-2650 CPUs and four AMD Radeon HD 7970 GPUs. A self-made water cooling system for CPUs and GPUs are equipped. Chundoong have achieved 106.8 TFLOPS (1.907 TFLOPS per node) on the LINPACK benchmark. It is ranked #277 in the TOP500 list and #32 in the Green500 list of November 2012.

The design of Chundoong is focused on achieving low cost and low power consumption. Chundoong adopts gaming GPUs instead of expensive HPC-dedicated accelerators. (Chundoong is the first supercomputer in the world that contains high-density gaming GPUs and ensures their reliability.) As a result, it costs only US$ 0.67 million. Its per-node performance is #1 among 412 clusters in TOP500 of November 2012. Chundoong is referred as the 7th power efficient architecture in TOP500 of November 2012.