Survey on Data Processing and Scheduling in Hadoop

Somya Singh; Neetu Narayan; Gaurav Raj

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Survey on Data Processing and Scheduling in Hadoop

by Somya Singh, Neetu Narayan, Gaurav Raj

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 119 - Number 22

Year of Publication: 2015

Authors: Somya Singh, Neetu Narayan, Gaurav Raj

10.5120/21370-4411

Somya Singh, Neetu Narayan, Gaurav Raj . Survey on Data Processing and Scheduling in Hadoop. International Journal of Computer Applications. 119, 22 ( June 2015), 27-30. DOI=10.5120/21370-4411

@article{ 10.5120/21370-4411,

author = { Somya Singh, Neetu Narayan, Gaurav Raj },

title = { Survey on Data Processing and Scheduling in Hadoop },

journal = { International Journal of Computer Applications },

issue_date = { June 2015 },

volume = { 119 },

number = { 22 },

month = { June },

year = { 2015 },

issn = { 0975-8887 },

pages = { 27-30 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume119/number22/21370-4411/ },

doi = { 10.5120/21370-4411 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:04:45.983124+05:30

%A Somya Singh

%A Neetu Narayan

%A Gaurav Raj

%T Survey on Data Processing and Scheduling in Hadoop

%J International Journal of Computer Applications

%@ 0975-8887

%V 119

%N 22

%P 27-30

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

There is an explosion in the volume of data in the world. The amount of data is increasing by leaps and bounds. The sources are individuals, social media, organizations, etc. The data may be structured, semi-structured or unstructured. Gaining knowledge from this data and using it for competitive advantage is the primary focus of all the organizations. In the last few years Big Data has found its way in almost every field, from government to private sectors, industry to academia. The major challenges associated with Big Data are data organization, modeling, data analysis and retrieval. Hadoop is a widely used software framework used for the large scale management and analysis of data. The main components of Hadoop: HDFS and MapReduce, enable the distributed storage and processing of data over a large number of commodity servers. This paper provides an overview of MapReduce and its capabilities and discusses the related issues.

References

Heger, A. D. Hadoop Design, Architecture & MapReduce Performance. DHTechnologies.
Olson, M. 2010 Hadoop: Scalable, Flexible Data Storage and Analysis. Cloudera, IQT Quarterly.
Doug, L. 2001 3D Data Management: Controlling Data Volume, Velocity and Variety. Meta Group, File 949.
White, C. 2012 MapReduce and the Data Scientist. BI Research.
Einav,L. and Levin, J. 2013. The Data Revolution and Economic Analysis. In Proceedings of the NBER Innovation Policy and the Economy Conference, Stanford University and NBER.
White, T. Hadoop: The Definitive Guide. 3rd Edition, O'Reilly.
Zhiqiang ,M. L. G. The Limitation of MapReduce: A Probing Case and a Lightweight Solution.
Yoo, D. and Sim K. M. 2011. A comparative review of Job Scheduling for MapReduce. In Proceedings of IEEE CCIS2011.
Dean, J. and Ghemawat, S. 2010. MapReduce: Simplified Data Processing on Large Clusters. Google Inc.
Dean, J. and Ghemawat, S. 2010. MapReduce: A Flexible Data Processing Tool. Communications of the ACM.
Big Data, http://en. wikipedia. org/wiki/Big_data.
Apache Hadoop, http://hadoop. apache. org/.
Rao, B. T. and Reddy. L. S. S. Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments. IJC A, 2011.
Haoop, https://en. wikipedia. org/wiki/Apache_Hadoop.

Index Terms

Computer Science

Information Sciences

Keywords

MapReduce Scheduling