CFP last date
20 May 2024
Reseach Article

Design and Analysis of Large Data Processing Techniques

by Madhavi Vaidya, Shrinivas Deshpande, Vilas Thakare
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 100 - Number 8
Year of Publication: 2014
Authors: Madhavi Vaidya, Shrinivas Deshpande, Vilas Thakare
10.5120/17546-8139

Madhavi Vaidya, Shrinivas Deshpande, Vilas Thakare . Design and Analysis of Large Data Processing Techniques. International Journal of Computer Applications. 100, 8 ( August 2014), 24-28. DOI=10.5120/17546-8139

@article{ 10.5120/17546-8139,
author = { Madhavi Vaidya, Shrinivas Deshpande, Vilas Thakare },
title = { Design and Analysis of Large Data Processing Techniques },
journal = { International Journal of Computer Applications },
issue_date = { August 2014 },
volume = { 100 },
number = { 8 },
month = { August },
year = { 2014 },
issn = { 0975-8887 },
pages = { 24-28 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume100/number8/17546-8139/ },
doi = { 10.5120/17546-8139 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:29:26.782371+05:30
%A Madhavi Vaidya
%A Shrinivas Deshpande
%A Vilas Thakare
%T Design and Analysis of Large Data Processing Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 100
%N 8
%P 24-28
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

As massive data acquisition and storage becomes increasingly affordable, a large number of enterprises are employing statisticians to make the sophisticated data analysis. Particularly, information extraction is done when the data is unstructured or semi-structured in nature. There are emerging efforts taken by both academia and industry on pushing information extraction inside parallel DBMSs. This leads to solving an significant and important issue on what can be a better choice for large scale data processing and analytics. To address this issue, we highlight the comparison and analysis of the three techniques which are nothing but the Parallel DBMS, MapReduce and Bulk Synchronous Processing in this paper.

References
  1. A Text from mongoDB official website, "Big Data:Examples and Guidelines for the Enterprise Decision Maker", May 2013
  2. Feng Wang,Bo Dong,Jie Qiu,Xinhui Li,Jie Yang,Ying Li, Hadoop High Availability through Metadata Replication, CloudDB'09 Proceedings of the First International Workshop on Cloud data management, ACM , Pages 37-44, 2009
  3. Daniel Peng and Frank Dabek, Large-scale Incremental Processing Using Distributed Transactions and Noti?cations, Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, USENIX , 2010
  4. An article written by Michael Walker , www. analyticbridge. com/profiles/blogs/percolator-dremel-and-pregel-alternatives-to-hadoop, August 12, 2012
  5. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elemeleegy, Russel Sears, Map Reduce Online, Proceedings in NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation, pages 1-14, Oct 9 2009
  6. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber,Bigtable: A Distributed Storage System for Structured Data, ACM Transactions on Computer Systems (TOCS), Volume 26 Issue 2, Article No. 4 , Pages1-14, June 2008
  7. Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin, Map Reduce and Parallel DBMSs Friends or Foes, communications of the ACM, Vol. 53 No. 1, Pages 64-71, January 2010
  8. J. Dean, S. Ghemawat, MapReduce: Simpli?ed Data Processing on Large Clusters, ACM Symposium on Operating Systems Design & Implementation - Volume 6, Pages 137–150, 2004
  9. Apache Software Foundation, Hadoop MapReduce, http://hadoop. apache. org/mapreduce, March 2012
  10. Apache Software Foundation, Hadoop Wiki: PoweredBy,http://wiki. apache. org/hadoop/PoweredBMarch 2012
  11. Michael Stonebraker, Daniel Abadi, J. Dewitt, Sam Madden,Erik Paulson, Andrew Pavlo, Alexander Rasin, MapReduce and Parallel DBMSs: friends or foes?", Communications of the ACM, Volume 53 Issue 1,Pages 64-71, January 2010
  12. L. G. Valiant, A Bridging Model for Parallel Computation, Communications of the ACM, Pages 103–111, 1990
  13. M. T. Goodrich, N. Sitchinava, Q. Zhang, Sorting, Searching and Simulation in the MapReduce Framework, ArXiv e-prints, Pages 1-11, January 2011
  14. Kaushik Chandrasekaran, "Analysis of Different Parallel Programming Models", Indiana University
  15. Kyo-Hang Lee, Hyunsak Choi, Mongki Moon, Parallel Data Processing with MapReduce: A Survey, SIGMOD Record, Vol. 40, No. 4, Pages 11-20, , December 2011
  16. J. Lin and C. Dyer, Data-Intensive Text Processing with MapReduce. Syn. Lec. on Human Lang. Tech. -10
  17. G. Weikum, J. Ho?art, N. Nisakashole, M. Spaniol, F. Suchanek, M. Yosef, Big data methods for Computational Linguistics, IEEE Data Eng. Bulletin, 2012
  18. Yu Xu,Pekka Kostamaa,Like Gao, Integrating Hadoop and Parallel DBMS, ACM SIGMOD'10 ACM SIGMOD International Conference on Management of Data Pages 969-974, 2010
  19. Book on "Hadoop: The Definitive Guide" by Tom White by O'Reilly Publication, 2010
  20. Shahfik Amasha, Distributed-Data-Analysis-Using-Map-Reduce, Singapore University
  21. Xiaqing Wu, Rodrigo Carceroni, Hui Fang, Steve Zelinka, Andrew Kirmse, Automatic Alignment of Large-Scale Aerial raster's to Road-Maps, 15th annual ACM international symposium on Advances in Geographic Information Systems, Article No. 17, 2007
  22. Christine Jardak, Janne Riihijärvi, Frank Oldewurtel, and Petri Mähönen, Parallel Processing of Data from Very Large-Scale Wireless Sensor Networks, HPDC '10 Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Pages 787-794, 2010
  23. A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi,A. Silberschatz, and A. Rasin. Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow, Pages 922–933, 2009
  24. J. N. Hoover. Start-ups bring google's parallel processing to data warehousing. 2008
  25. Thesis of Miriam Lawrence Mchome, Comparison study between MapReduce(MR) and Parallel Data Management Systems in Large Scale Data Analysis, 2011
Index Terms

Computer Science
Information Sciences

Keywords

Parallel MapReduce Hadoop BSP Distributed