CFP last date
20 May 2024
Reseach Article

An overview on Big Data and Hadoop

by Shaikh Abdul Hannan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 154 - Number 10
Year of Publication: 2016
Authors: Shaikh Abdul Hannan
10.5120/ijca2016912241

Shaikh Abdul Hannan . An overview on Big Data and Hadoop. International Journal of Computer Applications. 154, 10 ( Nov 2016), 29-35. DOI=10.5120/ijca2016912241

@article{ 10.5120/ijca2016912241,
author = { Shaikh Abdul Hannan },
title = { An overview on Big Data and Hadoop },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2016 },
volume = { 154 },
number = { 10 },
month = { Nov },
year = { 2016 },
issn = { 0975-8887 },
pages = { 29-35 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume154/number10/26529-2016912241/ },
doi = { 10.5120/ijca2016912241 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:59:55.296976+05:30
%A Shaikh Abdul Hannan
%T An overview on Big Data and Hadoop
%J International Journal of Computer Applications
%@ 0975-8887
%V 154
%N 10
%P 29-35
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Big data: Everyone just talking about Big data, but what is meant by big data actually? How is it changing the point of view in different fields such as researchers of the science or at companies, non-profits, governments, institutions, and other organizations are learning about big data that is nothing the world around them? Where this data is coming from, how is it being processed, and how are the results being stored and used for their future work? And why is open source so important to answering these questions? In this paper we will discuss all above points to clear actually what Big data means and how it deals in our day to day life. In today’s 21st century, the most important area is social media which shares, search and shares the information and generates huge of data everyday. So the importance of big data is more as millions and billons of peoples are using this media to share and store the information. Nowadays many projects are developing under social media, sensor data, stock exchange, Transport data, and in the field of science where data is most important factor to store and retrieve. So we need new technology which is Big data and Hadoop to handle this huge amount of data which is not possible to handle by RDBMS. Big data has very basic important characteristics such as volume, variety, veracity and velocity. Big data handles the large amount of data with management, analysis, storage and processed data very fast within the time span. In this paper discusses, the important characteristics, types of data which is used in big data, what are the various sources of big data in our day to day life, introduction to big data and Hadoop with explanation, Structure of Hadoop core components, role of Namenode and data node, function of job tracker and task tracker, and Hadoop Ecosystem is explained in detail.

References
  1. https://opensource.com/resources/big-data
  2. http://www.ibmbigdatahub.com/infographic/four-vs-big-data
  3. http://www.opentracker.net/article/definitions-big-data
  4. http://studymafia.org/wp-content/uploads/ 2015/05/CSE-Big-Data-Report.pdf
  5. http://www.vcloudnews.com/every-day-big-data-statistics-2-5-quintillion-bytes-of-data-created-daily/
  6. Avita Katal, Mohammad Wazid, R H Goudar “Big Data: Issues, Challenges, Tools and Good Practices”. In IEEE, Contemporary Computing (IC3), Sixth International Conference, pages 404-409, Noida, 2013.
  7. Jaskaran Singh and Varun Singla “Big Data: Tools and Technologies in Big Data”, International Journal of Computer Applications, Volume 112, No. 15, Feb. 2015.
  8. Cloudera White paper,”Ten Common Hadoop able Problems”, 2011.
  9. Kala Karun. A, Chitharanjan. K, “A Review on Hadoop – HDFS Infrastructure Extensions”. In IEEE, Information & Communication Technologies (ICT), pages 132-137, 2013.
  10. Sachchidanand Singh, Nirmala Singh, “Big Data Analytics”. In IEEE, International Conference on Communication, Information & Computing Technology (ICCICT) pages 1-4, 2012.
  11. Kapil Bakshi, “Considerations for Big Data: Architecture and Approach”. In IEEE, Aerospace Conference, pages 1-7 2012.
  12. Demchenko,Y, de Laat, C., Membrey, P.,” Defining architecture components of the Big Data Ecosystem”.In Collaboration Technologies and Systems (CTS),pages 104-112,2014.
  13. http://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/
  14. http://www.ibmbigdatahub.com/infographic/four-vs-big-data
  15. http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data
  16. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, “The Hadoop Distributed File System”. In IEEE, Contemporary Computing (IC3), Sixth International Conference, pages 404-409, Noida, 2010.
  17. http://searchcloudcomputing.techtarget.com/definition/Hadoop
  18. J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” in Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, ser. OSDI’04. Berkeley, CA, USA: USENIX Association, 2004, pp. 10–10.
  19. M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, “Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling,” in Proceedings of the 5th European conference on Computer systems. ACM, 2010, pp. 265–278.
  20. D. GOTTFRID, “Self-service, prorated supercomputing fun!” http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-supercomputing-fun/.
  21. Apache, “Hdfs,” http://apache.hadoop. org/hdfs/
  22. A. Foundation, “Yarn,” https://hadoop.apache.org/docs/r0.23.0/hadoopyarn/hadoop-yarn-site/YARN.html
  23. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, “The Hadoop Distributed File System” Yahoo! Sunnyvale, California USA.
  24. Jia-Chun Lin, Ingrid Chieh Yu, Einar Broch Johnsen , “ABS-YARN: A Formal Framework for Modeling Hadoop YARN Clusters ?, Ming-Chang Lee Department of Informatics, University of Oslo, Norway.
  25. Khalid Adam Ismail Hammad, et. al. Big Data Analysis and Storage, Proceedings of the 2015 International Conference on Operations Excellence and Service Engineering Orlando, Florida, USA, September 10-11, 2015.
  26. https://hadoopinku.wordpress.com/category/hadoop-2/
Index Terms

Computer Science
Information Sciences

Keywords

Big Data Hadoop HDFS MapReduce Hadoop Ecosystem Namenode Datanode.