Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Automated Tool to Generate Parallel CUDA Code from a Serial C Code

International Journal of Computer Applications
© 2012 by IJCA Journal
Volume 50 - Number 8
Year of Publication: 2012
Akhil Jindal
Nikhil Jindal
Divyashikha Sethia

Akhil Jindal, Nikhil Jindal and Divyashikha Sethia and. Article: Automated Tool to Generate Parallel CUDA Code from a Serial C Code. International Journal of Computer Applications 50(8):15-21, July 2012. Full text available. BibTeX

	author = {Akhil Jindal and Nikhil Jindal and Divyashikha Sethia and},
	title = {Article: Automated Tool to Generate Parallel CUDA Code from a Serial C Code},
	journal = {International Journal of Computer Applications},
	year = {2012},
	volume = {50},
	number = {8},
	pages = {15-21},
	month = {July},
	note = {Full text available}


With the introduction of GPGPUs, parallel programming has become simple and affordable. APIs such as NVIDIA's CUDA have attracted many programmers to port their applications to GPGPUs. But writing CUDA codes still remains a challenging task. Moreover, the vast repositories of legacy serial C codes, which are still in wide use in the industry, are unable to take any advantage of this extra computing power available. Lot of attempts have thus been made at developing auto-parallelization techniques to convert a serial C code to a corresponding parallel CUDA code. Some parallelizes, allow programmers to add "hints" to their serial programs, while another approach has been to build an interactive system between programmers and parallelizing tools/compilers. But none of these are really automatic techniques, since the programmer is fully involved in the process. In this paper, we present an automatic parallelization tool that completely relieves the programmer of any involvement in the parallelization process. Preliminary results with a basic set of usual C codes show that the tool is able to provide a significant speedup of ~10 times.


  • NVIDIA, NVIDIA CUDA Compute Unified Device Architecture-Programming Guide, Version 3, 2010.
  • Stone, J. E. , Gohara, D. , Guochun Shi, "OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems", Computing in Science and Engineering, Vol. 12, Issue 3, pp. 66-73, May 2010
  • E. Alerstam, T. Svensson and S. Andersson-Engels, "Parallel computing with graphics processing units for high speed Monte Carlo simulation of photon migration" , J. Biomedical Optics 13, 060504 (2008).
  • Larsen E. S. , Mcallister D. , "Fast matrix multiplies using graphics hardware", Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, Nov. 2001, pp. 55.
  • Vladimir Glavtchev, Pinar Muyan-Ozcelik, Jeffrey M. Ota, John D. Owens, "Feature-Based Speed Limit Sign Detection Using a Graphics Processing Unit", IEEE Intelligent Vehicles, 2011.
  • Woetzel J. , Koch R. , "Multi-camera realtime depth estimation with discontinuity handling on PC graphics hardware", Proceedings of the 17th International Conference on Pattern Recognition (Aug. 2004), pp. 741–744.
  • Rumpf M. , Strzodka R. , "Level set segmentation in graphics hardware", Proceedings of the IEEE International Conference on Image Processing (ICIP '01), Oct. 2001, vol. 3, pp. 1103–1106.
  • Purcell T. J. , Buck I. , Mark W. R. , Hanrahan P. , "Ray tracing on programmable graphics hardware", ACM Transactions on Graphics 21, 3 (July 2002), pp 703–712.
  • Knott D. , Pai D. K. , "CInDeR: Collision and interference detection in real-time using graphics hardware", Proceedings of the 2003 Conference on Graphics Interface, June 2003, pp. 73–80.
  • Svetlin A. Manavski, "Cuda compatible GPU as an efficient hardware accelerator for AES cryptography" Proc. IEEE International Conference on Signal Processing and Communication, ICSPC 2007, (Dubai, United Arab Emirates), November 2007, pp. 65-68.
  • T. D. Han and T. S. Abdelrahman, "hiCUDA: High-Level GPGPU Programming", IEEE Transactions on Parallel and Distributed Systems, Jan. 2011, vol. 22, no. 1, pp. 78-90.
  • David B. Loveman, "High Performance Fortran", IEEE Parallel & Distributed Technology: Systems & Technology, February 1993, v. 1 n. 1, pp 25-42.
  • Leonardo Dagum and Ramesh Menon, "OpenMP: An industry-standard API for shared-memory programming", IEEE Computational Science and Engineering, 5(1):46–55, January–March 1998.
  • VectorFabrics. vfAnalyst: Analyze your sequential C code to create an optimized parallel implementation. http://www. vectorfabrics. com/.
  • M. Hall, J. Anderson, S. Amarasinghe, B. Murphy, S. -W. Liao, E. Bugnion, and M. Lam, "Maximizing multiprocessor performance with the SUIF compiler", IEEE Comput. 29, 12, Dec. 1996, pp 84–89.
  • W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and P. Tu. "Advanced Program Restructuring for High-Performance Computers with Polaris", IEEE Computer, December 1996, Vol. 29, No. 12, pages 78- 82.
  • Johnson, S. P. , Evans, E. , Jin, H. , Ierotheou, C. S. , "The ParaWise Expert Assistant—Widening accessibility to efficient and scalable tool generated OpenMP code", WOMPAT, pp. 67–82 (2004).
  • T. D. Han, "Directive-Based General-Purpose GPU Programming", master's thesis, Univ. of Toronto, Sept. 2009.
  • Elsa: The Elkhound-based C/C++ Parser. http://www. scottmcpeak. com/elkhound/sources/elsa/