CFP last date
20 August 2024
Reseach Article

Generating Multi-million Data Set using GPGPU Accelerated Models

Published on April 2017 by Ghanshyam Verma, Priyanka Tripathi
National Conference on Contemporary Computing
Foundation of Computer Science USA
NCCC2016 - Number 2
April 2017
Authors: Ghanshyam Verma, Priyanka Tripathi

Ghanshyam Verma, Priyanka Tripathi . Generating Multi-million Data Set using GPGPU Accelerated Models. National Conference on Contemporary Computing. NCCC2016, 2 (April 2017), 4-9.

author = { Ghanshyam Verma, Priyanka Tripathi },
title = { Generating Multi-million Data Set using GPGPU Accelerated Models },
journal = { National Conference on Contemporary Computing },
issue_date = { April 2017 },
volume = { NCCC2016 },
number = { 2 },
month = { April },
year = { 2017 },
issn = 0975-8887,
pages = { 4-9 },
numpages = 6,
url = { /proceedings/nccc2016/number2/27341-6340/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Proceeding Article
%1 National Conference on Contemporary Computing
%A Ghanshyam Verma
%A Priyanka Tripathi
%T Generating Multi-million Data Set using GPGPU Accelerated Models
%J National Conference on Contemporary Computing
%@ 0975-8887
%V NCCC2016
%N 2
%P 4-9
%D 2017
%I International Journal of Computer Applications

Generating synthetic data set which is realistic as well as sufficiently large has been a cumbersome task for researchers in the past. Several models have been proposed previously, all adopting heterogeneous approaches, in this work the emphasis is on speeding up the compute time of the data set distribution. Here, Uniform, Poisson and Zipf distributions have been studied and approaches with parallel computation model have been proposed. The models have been verified for speedup using CUDA based implementation on NVIDIA Quadro 2000 GPU. A speed up in the range of 2x to 6x was observed for various range of data sets.

  1. Hao, W. , Ning, Y. , Chakraborty, P. , Vreeken, J. , Tatti, N. and Ramakrishnan, N. 2016. Generating Realistic Synthetic Population Datasets. arXiv preprint arXiv:1602. 06844.
  2. Cukier, K. 2010. Data, Data Everywhere. Technical Report. The Economist.
  3. Tay, L. 2013. Inside eBay's 90PB data warehouse. Technical Report. ITNews.
  4. Layton, J. 2006. How Amazon Works. Technical Report. HowStuffWorks. com.
  5. Ster, V. D. and Rousseau, H. 2015. Ceph- 30PB Test Report. Test Report. CERN.
  6. DeWitt, S. and Cohen, J. 2010. NASA Goddard Introduces the NASA Center for Climate Simulation. Press Release. Goddard, NASA.
  7. Hoag, J. E. and Thompson, C. W. 2007. A parallel general-purpose synthetic data generator. ACM SIGMOD Record 36, no. 1.
  8. Gray, J. , Sundaresan, P. , Englert, S. , Baclawski, K. and Weinberger, P. J. 1994. Quickly generating billion-record synthetic databases. In ACM SIGMOD Record, vol. 23, no. 2, pp. 243-252.
  9. Nathaniel, B. , Zhao, H. , Du, S. and Stolfo, S. J. 2014. Synthetic Data Generation and Defense in Depth Measurement of Web Applications. In International Workshop on Recent Advances in Intrusion Detection, pp. 234-254. Springer International Publishing.
  10. Shimpi, A. L. and Wilson, D. 2006. Nvidia's GeForce 8800 (G80): GPUs Re-architected for DirectX 10. Technical Report. AnandTech.
  11. Silberstein, M. , Schuster, A. , Geiger, D. , Patney, A. and Owens, J. D. 2008. Efficient computation of sum/products on GPUs through software-managed cache. In Proceedings of the 22nd annual international conference on Supercomputing - ICS '08.
  12. NVIDIA, CUDA. 2009. Architecture: Introduction & Overview. NVIDIA Corporation.
Index Terms

Computer Science
Information Sciences


Data Set Generation Synthetic Dataset Zipf Poisson Uniform Distribution Gpu Cuda.