A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size

Mohammad Badrul Alam Miah; Mehedi Hasan; Md. Kamal Uddin

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper

Know more

The week's pick

Enhancing Privacy Preservation: Multi-Attribute Protection with P-Sensitive K-Anonymity

Twinkle Patel Kiran Amin

Random Articles

Assessing the Effectiveness of Various Text Classification Algorithms in Customer Complaint Classification: An Informative Resource for Data Scientists and Data Analysts

Jan

2024

IRBBO for Gain Maximization of Fifteen-Element Yagi-Uda Antenna

April

2013

Intrusion Detection in Wireless Networks using FUZZY Neural Networks and Dynamic Context-Aware Role based Access Control Security (DCARBAC)

February

2012

Technique for Template Generation for Simultaneous Testing of Multiple Identical Functional Units in Super-scalar Architecture

November

2011

Reseach Article

A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size

by Mohammad Badrul Alam Miah, Mehedi Hasan, Md. Kamal Uddin

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 111 - Number 3

Year of Publication: 2015

Authors: Mohammad Badrul Alam Miah, Mehedi Hasan, Md. Kamal Uddin

10.5120/19515-1135

Mohammad Badrul Alam Miah, Mehedi Hasan, Md. Kamal Uddin . A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size. International Journal of Computer Applications. 111, 3 ( February 2015), 1-4. DOI=10.5120/19515-1135

@article{ 10.5120/19515-1135,

author = { Mohammad Badrul Alam Miah, Mehedi Hasan, Md. Kamal Uddin },

title = { A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size },

journal = { International Journal of Computer Applications },

issue_date = { February 2015 },

volume = { 111 },

number = { 3 },

month = { February },

year = { 2015 },

issn = { 0975-8887 },

pages = { 1-4 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume111/number3/19515-1135/ },

doi = { 10.5120/19515-1135 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:46:52.307343+05:30

%A Mohammad Badrul Alam Miah

%A Mehedi Hasan

%A Md. Kamal Uddin

%T A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size

%J International Journal of Computer Applications

%@ 0975-8887

%V 111

%N 3

%P 1-4

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

MapReduce is a powerful distributed processing model for large datasets. Hadoop is an open source framework and implementation of MapReduce. Hadoop distributed file system (HDFS) has become very popular to build large scale and high performance distributed data processing system. HDFS is designed mainly to handle big size files, so the processing of massive small files is a challenge in native HDFS. This paper focuses on introducing an approach to optimize the performance of processing of massive small files on HDFS. We design a new HDFS structure model which main idea is to merge the small files and write the small files at source direct into merged file. Experimental results show that the proposed scheme can improve the storage and access efficiencies of massive small files effectively on HDFS.

References

Apache Hadoop. http://hadoop. apache. org. [Last accessed: 20th December 2014]
Hadoop Wiki. https://wiki. apache. org/hadoop/PoweredBy#M. [Last accessed 20th December 2014]
J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008.
Cloudera. http://blog. cloudera. com/blog/2009/02/the-small-files-problem. [Last accessed: 20th December 2014]
Chuck Lam, "Hadoop in Action", Manning Publications, 2011.
Apache Hadoop. http://hadoop. apache. org/docs/stable2/hadoop-project-dist/hadoop-common/SingleNodeSetup. html. [Last accessed: 10th April 2014].
"Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll. " [Online]. Available: http://www. michael-noll. com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/. [Last accessed: 20th May 2014].
N. Mirajkar, S. Bhujbal, and A. Deshmukh, "Perform wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm," arXiv:1307. 1517 [cs], July 2013.
N. Mirajkar, S. Bhujbal, and A. Deshmukh, "Perform wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm," arXiv:1307. 1517 [cs], July 2013.
B. Dong, Q. Zheng, F. Tian, K. -M. Chao, R. Ma, and R. Anane, "An optimized approach for storing and accessing small files on cloud storage," Journal of Network and Computer Applications, vol. 35, no. 6, pp. 1847–1862, Nov. 2012.
Y. Zhang and D. Liu, "Improving the Efficiency of Storing for Small Files in HDFS," in 2012 International Conference on Computer Science Service System (CSSS), 2012, pp. 2239–2242.
"Welcome to Apache™ Hadoop®!". http://hadoop. apac [Last accessed: 20th December 2014]
he. org/docs/r2. 4. 0/. [Last accessed: 5th July 2014]

Index Terms

Computer Science

Information Sciences

Keywords

Hadoop MapReduce HDFS Big data Cluster