Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Performance Analysis of Parallel Sorting Algorithms using GPU Computing

IJCA Proceedings on Recent Innovations in Computer Science and Information Technology
© 2016 by IJCA Journal
RICSIT 2016 - Number 2
Year of Publication: 2016
Neetu Faujdar
S. P. Ghrera

Neetu Faujdar and S P Ghrera. Article: Performance Analysis of Parallel Sorting Algorithms using GPU Computing. IJCA Proceedings on Recent Innovations in Computer Science and Information Technology RICSIT 2016(2):5-11, September 2016. Full text available. BibTeX

	author = {Neetu Faujdar and S. P. Ghrera},
	title = {Article: Performance Analysis of Parallel Sorting Algorithms using GPU Computing},
	journal = {IJCA Proceedings on Recent Innovations in Computer Science and Information Technology},
	year = {2016},
	volume = {RICSIT 2016},
	number = {2},
	pages = {5-11},
	month = {September},
	note = {Full text available}


Sorting is a well interrogating issue in computer science. Many authors have invented numerous sorting algorithms on CPU (Central Processing Unit). In today's life sorting on the CPU is not so efficient. To get the efficient sorting parallelization should be done. There are many ways of parallelization of sorting but at the present time GPU (Graphics Processing Unit) computing is the most preferable way to parallelize the sorting algorithms. Many authors have implemented the some sorting algorithms using GPU computing with CUDA. This paper mentioned the roadmap of research direction of a GPU based sorting algorithms and the various research aspects to work on GPU based sorting algorithms. These research directions include the various sorting algorithms which are parallel (Merge, Quick, Bitonic, Odd-Even, Count, Radix etc. ) sort algorithms using GPU computing with CUDA (Compute Unified Device Architecture). In this paper, we have tested and compared the parallel and sequential (Merge, Quick, Count and Odd-Even sort) using dataset. The testing of parallel algorithms is done using GPU computing with CUDA. The speedup is also measured of various parallel sorting algorithms. The results have depicted that, the count sort is the most efficient sort due to based on the key value. Future research will refine the performance of sorting algorithms in GPU architecture.


  • Greb, Alexander, and Gabriel Zachmann, 2006. GPU-ABiSort: Optimal parallel sorting on stream architectures. IEEE Parallel and Distributed Processing Symposium, IPDPS, 20th International
  • Inoue, Hiroshi, et al, 2007. AA-sort: A new parallel sorting algorithm for multi-core SIMD processors. Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques. IEEE Computer Society.
  • Sintorn, Erik, and Ulf Assarsson. 2008. Fast parallel GPU-sorting using a hybrid algorithm. Journal of Parallel and Distributed Computing, vol. 68, No. 10, pp. 1381-1388.
  • Cederman, Daniel, and Philippas Tsigas. 2008. A practical quicksort algorithm for graphics processors. Algorithms-ESA, Springer Berlin Heidelberg, pp. 246-258.
  • Ro?en, T. , Krzysztof Boryczko, and Witold Alda. 2008. GPU bucket sort algorithm with applications to nearest-neighbour search.
  • Baraglia, Ranieri, et al. 2009. Sorting using bitonic network with CUDA. the 7th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR), Boston, USA.
  • Leischner, Nikolaj, Vitaly Osipov, and Peter Sanders. 2010. GPU sample sort. Parallel & Distributed Processing (IPDPS), International Symposium on IEEE.
  • Kukunas, Jim, and James Devine. 2009. GPGPU Parallel Merge Sort Algorithm. NVIDIA Technical Report NVR-
  • Oat, Christopher, Joshua Barczak, and Jeremy Shopf. 2010. Efficient spatial binning on the GPU. SIGGRAPH Asia
  • Huang, Bonan, Jinlan Gao, and Xiaoming Li. 2009. An empirically optimized radix sort for gpu. Parallel and Distributed Processing with Applications, IEEE International Symposium on.
  • Ye, Xiaochun, et al. 2010. High performance comparison-based sorting algorithm on many-core GPUs. Parallel & Distributed Processing (IPDPS), IEEE International Symposium on.
  • Peters, Hagen, Ole Schulz-Hildebrandt, and Norbert Luttenberger. 2010. Fast in-place sorting with cuda based on bitonic sort. Parallel Processing and Applied Mathematics. Springer Berlin Heidelberg, pp. 403-410.
  • Peters, Hagen, Ole Schulz-Hildebrandt, and Norbert Luttenberger. 2010. Parallel external sorting for CUDA-enabled GPUs with load balancing and low transfer overhead. Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), IEEE International Symposium on.
  • Satish, Nadathur, et al. 2010. Fast sort on cpus, gpus and intel mic architectures. Intel Labs, pp. 77-80.
  • Helluy, Philippe. 2011. A portable implementation of the radix sort algorithm in OpenCL.
  • Krueger, Jens, et al. 2011. Applicability of GPU Computing for Efficient Merge in In-Memory Databases. ADMS@ VLDB.
  • Miši?, Marko J. , and Milo V. Tomaševi?. 2011. Data sorting using graphics processing units. Telecommunications Forum (TELFOR), 19th. IEEE.
  • Peters, Hagen, Ole Schulz-Hildebrandt, and Norbert Luttenberger. 2012. A novel sorting algorithm for many-core architectures based on adaptive bitonic sort. Parallel & Distributed Processing Symposium (IPDPS), IEEE 26th International.
  • Jan, Bilal, et al. 2012. Fast parallel sorting algorithms on GPUs. " International Journal of Distributed and Parallel Systems, vol. 3, pp. 107-118.
  • Munavalli, Seema M. 2012. Efficient Algorithms for Sorting on GPUs.
  • Thouti, Krishnahari, and S. R. Sathe. 2012. An OpenCL Method of Parallel Sorting Algorithms for GPU Architecture.
  • ?urek, Dominik, et al. 2013. The comparison of parallel sorting algorithms implemented on different hardware platforms. " Computer Science, vol. 14, No. 4, pp. 679-691.
  • Panwar, Mukul, Monu Kumar, and Sanjay Bhargava. 2014. GPU Matrix Sort (An Efficient Implementation of Merge Sort). " International Journal of Computer Applications, vol. 89, No. 18, pp. 9-11.
  • Arturo Garcia, Jose Omar Alvizo Flores, Ulises Olivares Pinto, Felix Ramos. 2014. Fast Data Parallel Radix Sort Implementation in DirectX 11 Compute Shader to Accelerate Ray Tracing Algorithms. EURASIA GRAPHICS: International Conference on Computer Graphics, Animation and Gaming Technologies, pp. 27-36.
  • Gluck, Joshua. 2014. Fast GPGPU Based Quadtree Construction.
  • Ye, Yin, et al. 2014. GPUMemSort: A High Performance Graphics Co-processors Sorting Algorithm for Large Scale In-Memory Data. Journal on Computing (JoC), vol. 1, No. 2.
  • Polok, Lukas, Viorela Ila, and Pavel Smrz. 2014. Fast radix sort for sparse linear algebra on GPU. Proceedings of the High Performance Computing Symposium. Society for Computer Simulation International.
  • Mu, Qi, Liqing Cui, and Yufei Song. 2015. The implementation and optimization of Bitonic sort algorithm based on CUDA. arXiv preprint arXiv: 1506. 01446.
  • Xiao, Jun, Hao Chen, and Jianhua Sun. 2015. High Performance Approximate Sort Algorithm Using GPUs.
  • Ajdari, Jaumin, et al. 2015. A Version of Parallel Odd-Even Sorting Algorithm Implemented in CUDA Paradigm. " International Journal of Computer Science Issues (IJCSI), vol. 12, No. 3.
  • Neetu Faujdar and Satya Prakash Ghrera. 2015. Performance Evaluation of Merge and Quick Sort using GPU Computing with CUDA. International Journal of Applied Engineering Research, vol. 10, No. 18, pp. 39315-39319.
  • Frequent Itemset Mining Implementations Repository,http://fimi. cs. helsinki. fi accessed on 10/11/2015
  • Zubair Khan, Neetu Faujdar, et al. 2013. Modified BitApriori Algorithm: An Intelligent Approach for Mining Frequent Item-Set. Proc. Of Int. Conf. on Advance in Signal Processing and Communication, pp. 813-819.