CFP last date
20 May 2024
Reseach Article

Big Data Analysis with Dataset Scaling in Yet Another Resource Negotiator (YARN)

by Gurpreet Singh Bedi, Ashima Singh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 92 - Number 5
Year of Publication: 2014
Authors: Gurpreet Singh Bedi, Ashima Singh
10.5120/16009-5051

Gurpreet Singh Bedi, Ashima Singh . Big Data Analysis with Dataset Scaling in Yet Another Resource Negotiator (YARN). International Journal of Computer Applications. 92, 5 ( April 2014), 46-50. DOI=10.5120/16009-5051

@article{ 10.5120/16009-5051,
author = { Gurpreet Singh Bedi, Ashima Singh },
title = { Big Data Analysis with Dataset Scaling in Yet Another Resource Negotiator (YARN) },
journal = { International Journal of Computer Applications },
issue_date = { April 2014 },
volume = { 92 },
number = { 5 },
month = { April },
year = { 2014 },
issn = { 0975-8887 },
pages = { 46-50 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume92/number5/16009-5051/ },
doi = { 10.5120/16009-5051 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:13:32.094736+05:30
%A Gurpreet Singh Bedi
%A Ashima Singh
%T Big Data Analysis with Dataset Scaling in Yet Another Resource Negotiator (YARN)
%J International Journal of Computer Applications
%@ 0975-8887
%V 92
%N 5
%P 46-50
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The data is exceedingly large day by day. In some organizations, there is a need to analyze and process the gigantic data. This is a big data problem often faced by these organizations. It is not possible for single machine to handle that data. So we have used Apache Hadoop Distributed File System (HDFS) for storage and analysis. This paper shows experimental work done on the MapReduce Application on Health sector dataset. The result shows the behavior of the MapReduce application framework to map and reduce the big volume of the data. The main problem is to check the behavior of the MapReduce applications by increasing the size of dataset. Our analysis lies in understanding the Apache MapReduce application performance. We expect that execution time increases linearly with the dataset size but our analysis shows sometimes the execution time varies non-linearly with the increase in the dataset size. The experimental result shows that with scaling the datasets execution time distinguishes.

References
  1. Big data with the three dimensions: Volume, Velocity and Variety". Available at: http://www-01. ibm. com/soft ware/in/data/bigdata
  2. Tom White, "Hadoop: The Definitive Guide", O'Reilly Media 3rd edition, pp. 9-12.
  3. Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox "MapReduce for Data Intensive Scientific Analyses" in Fourth IEEE International Conference on eScience, 2008.
  4. Impetus white paper,"Planning Hadoop Projects for 2011", Available at: http://www. techrepublic. com/ whitepapers/planninghadoopnosql-projects-for-201/292 3717,March, 2011.
  5. Tom White, "Hadoop: The Definitive Guide", 3rd edition, O'Reilly Media, pp. 41-82.
  6. Saumitra Vaidya, Jyoti Nandimath, Ankur Patil, "Big Data Analysis Using Apache Hadoop" in IEEE IRI 2013, California, USA, August 14-16, 2013.
  7. "The Hadoop Architecture and Design", Hadoop official website. http://www. apache. org/common/docs/r0. 16. 4/ hdfs_d esign. html.
  8. P. Narayan, C. Neerdaels, T. Negrin, Ramakrishnan, U. Srivastava, "Building cloud for Yahoo" in IEEE Data Eng. Bull, page 36-42.
  9. Karthik Kambatla, Naresh Rapolu, Suresh Jagannathan, Ananth Grama, "Asynchronous Algorithms in MapReduce", in 2010 IEEE International Conference on Cluster Computing.
  10. J. Li, C. Dyer, J. First "Data Intensive Text Processing With MapReduce", in Morgan Publishers, April 30,2010
  11. Aditya B. Patel, Manashvi Birla, Ushma Nair, "Addressing Big Data Problem Using Hadoop and Map Reduce", IEEE International Conference on Engineering, Dec, 2012.
  12. Karen Montgomery, "Big Data Now", 2nd edition, O'Reilly Media, page 83-93, 2012.
  13. Avita Katal, Mohammad Wazid, R H Goudar, "Big Data: Issues, Challenges, Tools and Good Practices", in IEEE 2013 conference in Graphic Era University, Dehradun, India, 2013.
  14. "Apache Hadoop YARN: Yet Another Resource Negotiator" in Santa Clara, California, USA, in October, 2013, ACM Publications.
  15. Arun Murthy, Jeffrey Markham, Vinod Vavilapalli, Doug Eadline, "Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2", Addison Welson and Data Analytic Series, 2013.
  16. Fan Zhang, Majd Sakr, "Data Scaling And Map Reduce Performance", in 2013 IEEE International Conference on Parallel & Distributed Processing Workshops and Phd Forum.
  17. "Apache Mapreduce Setup", Available at: http:www. hadoop. apache. org/mapreduce.
  18. "Health Dataset". Available at: http://www. healthdata. g ov/dataset
Index Terms

Computer Science
Information Sciences

Keywords

Big Data Hadoop MapReduce YARN Single Node Multi Node Dataset Scaling.