Call for Paper - May 2023 Edition
IJCA solicits original research papers for the May 2023 Edition. Last date of manuscript submission is April 20, 2023. Read More

Parallelizing RSA Algorithm on Multicore CPU and GPU

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 87 - Number 6
Year of Publication: 2014
Heba Mohammed Fadhil
Mohammed Issam Younis

Heba Mohammed Fadhil and Mohammed Issam Younis. Article: Parallelizing RSA Algorithm on Multicore CPU and GPU. International Journal of Computer Applications 87(6):15-22, February 2014. Full text available. BibTeX

	author = {Heba Mohammed Fadhil and Mohammed Issam Younis},
	title = {Article: Parallelizing RSA Algorithm on Multicore CPU and GPU},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {87},
	number = {6},
	pages = {15-22},
	month = {February},
	note = {Full text available}


Public key algorithms are extensively known to be slower than symmetric key alternatives in the area of cryptographic algorithms for the reason of their basis in modular arithmetic. The most public key algorithm widely used is the RSA. Therefore, how to enhance the speed of RSA algorithm has been the research significant topic in the computer security as well as in computing fields. With remarkable increase in the computing capability of the modern Graphics Processing Unit's (GPUs) as a co-processor of the CPU, one can significantly benefit from the Single Instruction Multiple Thread (SIMT) style of computing. This paper proposes a hybrid system to parallelize the RSA for multicore CPU and many cores GPUs with variable key size. In doing so, three variants implementation for the RSA algorithm are done to facilitate the performance comparison against Crypto++ library and sequential counterpart. The GPU implementation gained approximately 23 speed up factor over the sequential CPU implementation; while the multithread CPU implementation gained only 6 speed up factor over the sequential CPU implementation as far as the latency is concerned. Furthermore, additional speedup could be gained as far as the throughput is concerned; the throughput gained for 1024 bits is ~1800 msg/sec; as for 2048 bits is ~250 msg/sec. Due to overlapping of multithread operation whenever free resources are available. The experiments are conducted on a laptop with Intel Core I7-2670QM, 2. 20 GHz CPU and Nvidia GeForce GT630M GPU. Results reveal that the GPU is appropriate to speed up the RSA algorithm.


  • Damrudi, M. and Ithnin, N. "State of the Art Practical Parallel Cryptographic Approaches", Australian Journal of Basic and Applied Sciences, Vol. 5, No. 7, pp. 660-677, 2011.
  • Rasool, S. ; Qyser, A. ; Rizwanullah, M. and Ghori, M. "Secure Data Transmission over Networks", Asian Journal of Computer Science and Information Technology, Vol. 2, No. 8, pp. 257– 261, 2012.
  • Huang, Z. and Li, S "Design and Implementation of a Low Power RSA Process for Smartcard", International Journal of Modern Education and Computer Science, Vol. 3, No. 3, pp. 8-14, 2011.
  • Sepahvandi, S. ; Hosseinzadeh, M. ; Navi, K. and Jalali, A. "An Improved Exponentiation Algorithm for RSA Cryptosystem", International Conference on Research Challenges in Computer Science, ICRCCS '09, pp. 128-132, 2009.
  • Lara, P. ; Borges, F. ; Portugal, R. and Nedjah, N. "Parallel modular exponentiation using load balancing without pre computation", Journal of Computer and System Sciences, Vol. 78, No. 2, pp. 575–582, 2012.
  • Owens, J. ; Houston, M. ; Luebke, D. ; Green, S. ; Stone, J. and Phillips, J. "GPU Computing", Proceedings of the IEEE, Journals & Magazines, Vol. 96, No. 5, pp. 879-899, 2008.
  • Gupta,S. and Babu, M. "Performance Analysis of GPU compared to Single core and Multi-core CPU for Natural Language Applications", International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 2, No. 5, pp. 50-53, 2011.
  • Su, C. ; Lan, C. ; Huang, L. and Wu, K. "Overview and Comparison of OpenCL and CUDA Technology for GPGPU", IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), pp. 448 – 451, 2012.
  • Thouti, K. and Sathe, S. " Comparison of Open MP and Open CL Parallel Processing Technologies", International Journal of Advanced Computer Science and Applications (IJACSA), Vol. 3, No. 4, pp. 56-61, 2012.
  • Rao, R. ; Lakshmi, P. ; Shankar, N. "A Novel Modular Multiplication Algorithm and its Application to RSA Decryption", International Journal of Computer Science Issues (IJCSI), Vol. 9, Issue 6, No 3, pp. 303-309, November, 2012.
  • Hwu, W. ; Keutzer, K. and Mattson, T. G. "The Concurrency Challenge", IEEE Design & Test of Computers, Vol. 25, No. 4, pp. 312-320, 2008.
  • Moss, A. ; Page, D. and Smart, N. "Toward Acceleration of RSA Using 3D Graphics Hardware", Proceedings of the 11th IMA International Conference on Cryptography and Coding, pp. 364-383, 2007.
  • Szerwinski, R. and Güneysu, T. "Exploiting the Power of GPUs for Asymmetric Cryptography", 10th International Workshop on Cryptographic Hardware and Embedded Systems – CHES 08 , Washington, D. C. , USA, Volume 5154, pp. 79-99, 2008.
  • Harrison, O. and Waldron, J. "Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware", The 2nd International Conference on Cryptology in Africa, Progress in Cryptology (AFRICACRYPT 09), Lecture Notes in Computer Science ,Volume 5580, PP. 350-367, 2009.
  • Fleissner, S. "GPU-Accelerated Montgomery Exponentiation", 7th International Conference Computational Science – ICCS 07, Beijing, China, Lecture Notes in Computer Science, Vol. 4487,Springer ,pp. 213-220, 2007.
  • Neves, S. and Araujo, F. "On the Performance of GPU Public-Key Cryptography", IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 133-140, 2011.
  • Li, T. ; Li, H. and Xiang, J. "A GPU-based Fine-grained Parallel Montgomery Multiplication Algorithm" , Recent Advances in Computer Science and Information Engineering, Vol. 126, pp. 135-143 , 2012.
  • Rivest, R. ; Shamir, A. and Adleman, L. "A Method for Obtaining Digital Signatures and Public-Key Cryptosystems", Published in Magazine Communications of the ACM, New York, NY, USA. Volume 21, Issue 2, pp. 120-126, February 1978.
  • Alijani, G. ; Christy, J. ; Craft, H. ; Mok, P. and Welsh, J. "Design and Implementation of an Information Security Model for E-Business", Information Systems Education Journal, Vol. 4, No. 4, pp. 1-13, 2006.
  • Zhao, L. ; Iyer, R. ; Makineni, S. and Bhuyan, L. "Anatomy and Performance of SSL Processing ", IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 05, pp. 197-206, 2005.
  • Ramachandra Rao, G. A. V. ; Lakshmi, P. V. ; Ravi Shankar, N. " RSA Public Key Cryptosystem using Modular Multiplication " International Journal of Computer Applications, Vol. 80, No5, pp. 38-42, October 2013.
  • Selçuk, B. and Erkay, S. "Highly-Parallel Montgomery Multiplication for Multi-core General-Purpose Microprocessors", Computer and Information Sciences III, 27th International Symposium on Computer and Information Sciences, pp. 467-476, 2013.
  • Montgomery, P. "Modular Multiplication without Trial Division", Mathematics of Computation, Vol. 44, No. 170, pp. 519-521, 1985.
  • Koc, C. ; Acar, T. and Kaliski, B. "Analyzing and Comparing Montgomery Multiplication Algorithms", IEEE Micro, Vol. 16, No. 3, pp. 26-33, 1996.
  • Crypto++ web site, available at: http://www. cryptopp. com, last accessed on 6 January 2014.
  • Hwu, W. ; Rodrigues, C. ; Ryoo, S. and Stratton, J. " Compute Unified Device Architecture Application Suitability", IEEE Computing in Science & Engineering, Vol. 11, No. 3, pp. 16 - 26 2009.
  • Roosta, S. " Parallel Processing and Parallel Algorithms: Theory and Computation ", ISBN: 0-387-98716-9, Springer Verlag, 2000.
  • Kermarrec, A. ; Bougé, L. and Priol, T. "Euro-Par 2007 Parallel Processing ", 13th International Euro-Par Conference: Lecture Notes in Computer Science, ISBN 978-3-540-74465-8, Vol. 4641, 2007.