Call for Paper - August 2022 Edition
IJCA solicits original research papers for the August 2022 Edition. Last date of manuscript submission is July 20, 2022. Read More

Generating Multi-million Data Set using GPGPU Accelerated Models

IJCA Proceedings on National Conference on Contemporary Computing
© 2017 by IJCA Journal
NCCC 2016 - Number 2
Year of Publication: 2017
Ghanshyam Verma
Priyanka Tripathi

Ghanshyam Verma and Priyanka Tripathi. Article: Generating Multi-million Data Set using GPGPU Accelerated Models. IJCA Proceedings on National Conference on Contemporary Computing NCCC 2016(2):4-9, April 2017. Full text available. BibTeX

	author = {Ghanshyam Verma and Priyanka Tripathi},
	title = {Article: Generating Multi-million Data Set using GPGPU Accelerated Models},
	journal = {IJCA Proceedings on National Conference on Contemporary Computing},
	year = {2017},
	volume = {NCCC 2016},
	number = {2},
	pages = {4-9},
	month = {April},
	note = {Full text available}


Generating synthetic data set which is realistic as well as sufficiently large has been a cumbersome task for researchers in the past. Several models have been proposed previously, all adopting heterogeneous approaches, in this work the emphasis is on speeding up the compute time of the data set distribution. Here, Uniform, Poisson and Zipf distributions have been studied and approaches with parallel computation model have been proposed. The models have been verified for speedup using CUDA based implementation on NVIDIA Quadro 2000 GPU. A speed up in the range of 2x to 6x was observed for various range of data sets.


  • Hao, W. , Ning, Y. , Chakraborty, P. , Vreeken, J. , Tatti, N. and Ramakrishnan, N. 2016. Generating Realistic Synthetic Population Datasets. arXiv preprint arXiv:1602. 06844.
  • Cukier, K. 2010. Data, Data Everywhere. Technical Report. The Economist.
  • Tay, L. 2013. Inside eBay's 90PB data warehouse. Technical Report. ITNews.
  • Layton, J. 2006. How Amazon Works. Technical Report. HowStuffWorks. com.
  • Ster, V. D. and Rousseau, H. 2015. Ceph- 30PB Test Report. Test Report. CERN.
  • DeWitt, S. and Cohen, J. 2010. NASA Goddard Introduces the NASA Center for Climate Simulation. Press Release. Goddard, NASA.
  • Hoag, J. E. and Thompson, C. W. 2007. A parallel general-purpose synthetic data generator. ACM SIGMOD Record 36, no. 1.
  • Gray, J. , Sundaresan, P. , Englert, S. , Baclawski, K. and Weinberger, P. J. 1994. Quickly generating billion-record synthetic databases. In ACM SIGMOD Record, vol. 23, no. 2, pp. 243-252.
  • Nathaniel, B. , Zhao, H. , Du, S. and Stolfo, S. J. 2014. Synthetic Data Generation and Defense in Depth Measurement of Web Applications. In International Workshop on Recent Advances in Intrusion Detection, pp. 234-254. Springer International Publishing.
  • Shimpi, A. L. and Wilson, D. 2006. Nvidia's GeForce 8800 (G80): GPUs Re-architected for DirectX 10. Technical Report. AnandTech.
  • Silberstein, M. , Schuster, A. , Geiger, D. , Patney, A. and Owens, J. D. 2008. Efficient computation of sum/products on GPUs through software-managed cache. In Proceedings of the 22nd annual international conference on Supercomputing - ICS '08.
  • NVIDIA, CUDA. 2009. Architecture: Introduction & Overview. NVIDIA Corporation.