CFP last date
20 May 2024
Reseach Article

Heterogeneous Data Processing using Hadoop and Java Map/Reduce

by Jasmeet Singh Puaar, Ramanjeet Kaur
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 146 - Number 9
Year of Publication: 2016
Authors: Jasmeet Singh Puaar, Ramanjeet Kaur
10.5120/ijca2016910846

Jasmeet Singh Puaar, Ramanjeet Kaur . Heterogeneous Data Processing using Hadoop and Java Map/Reduce. International Journal of Computer Applications. 146, 9 ( Jul 2016), 13-16. DOI=10.5120/ijca2016910846

@article{ 10.5120/ijca2016910846,
author = { Jasmeet Singh Puaar, Ramanjeet Kaur },
title = { Heterogeneous Data Processing using Hadoop and Java Map/Reduce },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2016 },
volume = { 146 },
number = { 9 },
month = { Jul },
year = { 2016 },
issn = { 0975-8887 },
pages = { 13-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume146/number9/25425-2016910846/ },
doi = { 10.5120/ijca2016910846 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:49:57.052928+05:30
%A Jasmeet Singh Puaar
%A Ramanjeet Kaur
%T Heterogeneous Data Processing using Hadoop and Java Map/Reduce
%J International Journal of Computer Applications
%@ 0975-8887
%V 146
%N 9
%P 13-16
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, the objective is to do analysis of New York Stock Exchange's heterogeneous sample data using java map-reduce on Hadoop platform. Java programming as well as Java map-reduce API has been used to work upon huge amount of data i.e. BIG DATA. The source data is of heterogeneous type. The format and the structure of the data files worked with are different. So, it was challenging to handle the data and send it to the mappers to get a single reduced output file. The analysis of NYSE's data was done to find out the maximum and minimum price of every particular stock exchange for each year and to calculate average stock price of any stock exchange for a particular year by using record of its dividends in the sample data. This has been done by usage of data from two different files namely: dividends.csv and sample_prices.csv .The output of the program was saved to the HDFS file system. This output can then be saved to our NTFS file system using Sqoop or the files can be manually copied to our system for further processing.

References
  1. Thomas H. Davenport, 2014 big data @ work Harvard business review press.
  2. T. Kraska, "Finding the Needle in the Big Data Systems Haystack," IEEE Internet Computing, vol. 17, no. 1, pp. 84-86, 2013.
  3. Lekha R.Nair, 2014 Research in Big Data and Analytics: An Overview IJCA Volume 108
  4. Siddharth Mehta 2015 Big Data analytics made easy with SQL and MapReduce
  5. Online Searcher: Information Discovery, Technology, Strategies Volume 38, Number 2 - March/April 2014
  6. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Communications of the ACM, 51 (1): 107-113, 2008.
  7. https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
  8. P. Amuthabala Kavya. T.C 2016 Outlook on various scheduling approaches in Hadoop P. Amuthabala et al. / International Journal on Computer Science and Engineering (IJCSE).
  9. Manisha R. Thakare S.W. Mohod A.N. Thakare Various Data-Mining Techniques for Big Data IJCA Number 8
  10. Kvn Krishna Mohan, K Prem Sai Reddy 2016 Efficient Big Data Processing in Hadoop MapReduce IJARCSSE Volume 6 Issue 3
Index Terms

Computer Science
Information Sciences

Keywords

Heterogeneous data processing MapReduce Big data Data Analysis HDFS multiple input NYSE data.