An Improved Approach for Analysis of Hadoop Data for All Files

Heena Jain; Ajay Goyal

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

Quantifying Label-Induced Bias in Large Language Model Self and Cross Evaluations

Muskan Saraf Sajjad Rezvani Boroujeni Justin Beaudry Hossein Abedi Tom Bush

Random Articles

Survey of Methods of Solving TSP along with its Implementation using Dynamic Programming Approach

August

2012

Coordinator Location Effects in AODV Routing Protocol in ZigBee Mesh Network

October

2015

A Simple and Efficient Roadmap to Process Fingerprint Images in Frequency Domain

February

2015

Architectural Distortion Detection in Mammogram using Contourlet Transform and Texture Features

July

2013

Reseach Article

An Improved Approach for Analysis of Hadoop Data for All Files

by Heena Jain, Ajay Goyal

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 157 - Number 4

Year of Publication: 2017

Authors: Heena Jain, Ajay Goyal

10.5120/ijca2017912663

Heena Jain, Ajay Goyal . An Improved Approach for Analysis of Hadoop Data for All Files. International Journal of Computer Applications. 157, 4 ( Jan 2017), 15-20. DOI=10.5120/ijca2017912663

@article{ 10.5120/ijca2017912663,

author = { Heena Jain, Ajay Goyal },

title = { An Improved Approach for Analysis of Hadoop Data for All Files },

journal = { International Journal of Computer Applications },

issue_date = { Jan 2017 },

volume = { 157 },

number = { 4 },

month = { Jan },

year = { 2017 },

issn = { 0975-8887 },

pages = { 15-20 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume157/number4/26818-2017912663/ },

doi = { 10.5120/ijca2017912663 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:03:01.156340+05:30

%A Heena Jain

%A Ajay Goyal

%T An Improved Approach for Analysis of Hadoop Data for All Files

%J International Journal of Computer Applications

%@ 0975-8887

%V 157

%N 4

%P 15-20

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Here in this paper an efficient Framework is implemented for Hadoop Platform for almost all types of Files. The Proposed Methodology implemented here is based on various algorithms implemented on Hadoop Platform such as Scan, Read, Sort etc. Various Workloads are used for the Analysis of the Algorithms of small and big size such as Facebook, Co-author, and Twitter. The Experimental results show the performance of the proposed methodology. The Methodology provides efficient Running Time, NameNode Memory and Throughput as compared to the existing methodology.

References

J. Dean and S. Ghemawat. Mapreduce: simpliﬁed data processing on large clusters. OSDI’04, Berkeley, CA, USA, 2004. USENIX Association.
Apache.org. Hadoop distributed ﬁle system. http://hadoop.apache.org.
B. Fan, W. Tantisiriroj, L. Xiao, and G. Gibson. Diskreduce: Raid for data-intensive scalable computing. In Proceedings of the 4th Annual Workshop on Petascale Data Storage, PDSW ’09, pages 6–10, New York, NY, USA, 2009. ACM.
S. Ghemawat, H. Gobioff, and S.-T. Leung. The google ﬁle system. SIGOPS Oper. Syst. Rev., (5), 2003.
M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. Xoring elephants: novel erasure codes for big data. In Proceedings of the 39th international conference on Very Large Data Bases, PVLDB’13, pages 325–336, 2013.
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDIn'04, pages 137-150, 2005.
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A.Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008.
V.S.Patil, P.D.Soni, Hadoop Skeleton & Fault Tolerance in Hadoop Clusters,International Journal of Application or Innovation in Engineering & Management, Volume 2, Issue 2;February 2013,pp.247-250.
J. Evans, Fault Tolerance in Hadoop for Work Migration, Technical Report CSCI B534 (Survey Paper), Indiana University;November 2011.
J.Dean, S.Ghemawat, MapReduce: Simplified Data Processing on Large Clusters,Communication of The ACM;Jan. 2008,pp. 107-113.
I.Goiri,F. Julià,J.Guitart, J.Torres, Checkpoint-Based Fault-Tolerant Infrastructure for Virtualized Service Providers. IEEE/IFIP Network Operations and Management Symposium,IEEE. Osaka, Japan;April 2010,pp. 455-462.
Fang Zhou Hai Pham Jianhui Yue Hao Zou Weikuan Yu, “SFMapReduce: An Optimized MapReduce Framework for Small Files, IEEE 2015.
Fengguang Tian and Keke Chen. Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds. In Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing, CLOUD ’11, pages 155–162, 2011.
Steﬀen Valvag and Dag Johansen. Oivos: Simple and Eﬃcient Distributed Data Proessing. In IEEE 10th International Conference on High Performance Computing and Communications, pages 113–122, Sept. 2008.
K. Potisepp, Large Scale Image Processing Using MapReduce, MSc. Thesis, Institute of Computer Science, Tartu University, 2013.

Index Terms

Computer Science

Information Sciences

Keywords

Hadoop HDFS NameNode SFReduce MapReduce Facebook Twitter.