CFP last date
20 June 2024
Reseach Article

An Improved Approach for Analysis of Hadoop Data for All Files

by Heena Jain, Ajay Goyal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 157 - Number 4
Year of Publication: 2017
Authors: Heena Jain, Ajay Goyal

Heena Jain, Ajay Goyal . An Improved Approach for Analysis of Hadoop Data for All Files. International Journal of Computer Applications. 157, 4 ( Jan 2017), 15-20. DOI=10.5120/ijca2017912663

@article{ 10.5120/ijca2017912663,
author = { Heena Jain, Ajay Goyal },
title = { An Improved Approach for Analysis of Hadoop Data for All Files },
journal = { International Journal of Computer Applications },
issue_date = { Jan 2017 },
volume = { 157 },
number = { 4 },
month = { Jan },
year = { 2017 },
issn = { 0975-8887 },
pages = { 15-20 },
numpages = {9},
url = { },
doi = { 10.5120/ijca2017912663 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-07T00:03:01.156340+05:30
%A Heena Jain
%A Ajay Goyal
%T An Improved Approach for Analysis of Hadoop Data for All Files
%J International Journal of Computer Applications
%@ 0975-8887
%V 157
%N 4
%P 15-20
%D 2017
%I Foundation of Computer Science (FCS), NY, USA

Here in this paper an efficient Framework is implemented for Hadoop Platform for almost all types of Files. The Proposed Methodology implemented here is based on various algorithms implemented on Hadoop Platform such as Scan, Read, Sort etc. Various Workloads are used for the Analysis of the Algorithms of small and big size such as Facebook, Co-author, and Twitter. The Experimental results show the performance of the proposed methodology. The Methodology provides efficient Running Time, NameNode Memory and Throughput as compared to the existing methodology.

  1. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. OSDI’04, Berkeley, CA, USA, 2004. USENIX Association.
  2. Hadoop distributed file system.
  3. B. Fan, W. Tantisiriroj, L. Xiao, and G. Gibson. Diskreduce: Raid for data-intensive scalable computing. In Proceedings of the 4th Annual Workshop on Petascale Data Storage, PDSW ’09, pages 6–10, New York, NY, USA, 2009. ACM.
  4. S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. SIGOPS Oper. Syst. Rev., (5), 2003.
  5. M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. Xoring elephants: novel erasure codes for big data. In Proceedings of the 39th international conference on Very Large Data Bases, PVLDB’13, pages 325–336, 2013.
  6. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDIn'04, pages 137-150, 2005.
  7. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A.Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008.
  8. V.S.Patil, P.D.Soni, Hadoop Skeleton & Fault Tolerance in Hadoop Clusters,International Journal of Application or Innovation in Engineering & Management, Volume 2, Issue 2;February 2013,pp.247-250.
  9. J. Evans, Fault Tolerance in Hadoop for Work Migration, Technical Report CSCI B534 (Survey Paper), Indiana University;November 2011.
  10. J.Dean, S.Ghemawat, MapReduce: Simplified Data Processing on Large Clusters,Communication of The ACM;Jan. 2008,pp. 107-113.
  11. I.Goiri,F. Julià,J.Guitart, J.Torres, Checkpoint-Based Fault-Tolerant Infrastructure for Virtualized Service Providers. IEEE/IFIP Network Operations and Management Symposium,IEEE. Osaka, Japan;April 2010,pp. 455-462.
  12. Fang Zhou Hai Pham Jianhui Yue Hao Zou Weikuan Yu, “SFMapReduce: An Optimized MapReduce Framework for Small Files, IEEE 2015.
  13. Fengguang Tian and Keke Chen. Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds. In Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing, CLOUD ’11, pages 155–162, 2011.
  14. Steffen Valvag and Dag Johansen. Oivos: Simple and Efficient Distributed Data Proessing. In IEEE 10th International Conference on High Performance Computing and Communications, pages 113–122, Sept. 2008.
  15. K. Potisepp, Large Scale Image Processing Using MapReduce, MSc. Thesis, Institute of Computer Science, Tartu University, 2013.
Index Terms

Computer Science
Information Sciences


Hadoop HDFS NameNode SFReduce MapReduce Facebook Twitter.