CFP last date
20 May 2024
Reseach Article

A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size

by Mohammad Badrul Alam Miah, Mehedi Hasan, Md. Kamal Uddin
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 111 - Number 3
Year of Publication: 2015
Authors: Mohammad Badrul Alam Miah, Mehedi Hasan, Md. Kamal Uddin
10.5120/19515-1135

Mohammad Badrul Alam Miah, Mehedi Hasan, Md. Kamal Uddin . A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size. International Journal of Computer Applications. 111, 3 ( February 2015), 1-4. DOI=10.5120/19515-1135

@article{ 10.5120/19515-1135,
author = { Mohammad Badrul Alam Miah, Mehedi Hasan, Md. Kamal Uddin },
title = { A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size },
journal = { International Journal of Computer Applications },
issue_date = { February 2015 },
volume = { 111 },
number = { 3 },
month = { February },
year = { 2015 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume111/number3/19515-1135/ },
doi = { 10.5120/19515-1135 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:46:52.307343+05:30
%A Mohammad Badrul Alam Miah
%A Mehedi Hasan
%A Md. Kamal Uddin
%T A New HDFS Structure Model to Evaluate The Performance of Word Count Application on Different File Size
%J International Journal of Computer Applications
%@ 0975-8887
%V 111
%N 3
%P 1-4
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

MapReduce is a powerful distributed processing model for large datasets. Hadoop is an open source framework and implementation of MapReduce. Hadoop distributed file system (HDFS) has become very popular to build large scale and high performance distributed data processing system. HDFS is designed mainly to handle big size files, so the processing of massive small files is a challenge in native HDFS. This paper focuses on introducing an approach to optimize the performance of processing of massive small files on HDFS. We design a new HDFS structure model which main idea is to merge the small files and write the small files at source direct into merged file. Experimental results show that the proposed scheme can improve the storage and access efficiencies of massive small files effectively on HDFS.

References
  1. Apache Hadoop. http://hadoop. apache. org. [Last accessed: 20th December 2014]
  2. Hadoop Wiki. https://wiki. apache. org/hadoop/PoweredBy#M. [Last accessed 20th December 2014]
  3. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008.
  4. Cloudera. http://blog. cloudera. com/blog/2009/02/the-small-files-problem. [Last accessed: 20th December 2014]
  5. Chuck Lam, "Hadoop in Action", Manning Publications, 2011.
  6. Apache Hadoop. http://hadoop. apache. org/docs/stable2/hadoop-project-dist/hadoop-common/SingleNodeSetup. html. [Last accessed: 10th April 2014].
  7. "Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael G. Noll. " [Online]. Available: http://www. michael-noll. com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/. [Last accessed: 20th May 2014].
  8. N. Mirajkar, S. Bhujbal, and A. Deshmukh, "Perform wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm," arXiv:1307. 1517 [cs], July 2013.
  9. N. Mirajkar, S. Bhujbal, and A. Deshmukh, "Perform wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm," arXiv:1307. 1517 [cs], July 2013.
  10. B. Dong, Q. Zheng, F. Tian, K. -M. Chao, R. Ma, and R. Anane, "An optimized approach for storing and accessing small files on cloud storage," Journal of Network and Computer Applications, vol. 35, no. 6, pp. 1847–1862, Nov. 2012.
  11. Y. Zhang and D. Liu, "Improving the Efficiency of Storing for Small Files in HDFS," in 2012 International Conference on Computer Science Service System (CSSS), 2012, pp. 2239–2242.
  12. "Welcome to Apache™ Hadoop®!". http://hadoop. apac [Last accessed: 20th December 2014]
  13. he. org/docs/r2. 4. 0/. [Last accessed: 5th July 2014]
Index Terms

Computer Science
Information Sciences

Keywords

Hadoop MapReduce HDFS Big data Cluster