Call for Paper - January 2021 Edition
IJCA solicits original research papers for the January 2021 Edition. Last date of manuscript submission is December 21, 2020. Read More

Big Data Analytic of Nigeria Population Census Data using MapReduce and K-Means Algorithm

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Authors:
Akande Oyebola, Osofisan Adenike
10.5120/ijca2018917999

Akande Oyebola and Osofisan Adenike. Big Data Analytic of Nigeria Population Census Data using MapReduce and K-Means Algorithm. International Journal of Computer Applications 181(23):14-21, October 2018. BibTeX

@article{10.5120/ijca2018917999,
	author = {Akande Oyebola and Osofisan Adenike},
	title = {Big Data Analytic of Nigeria Population Census Data using MapReduce and K-Means Algorithm},
	journal = {International Journal of Computer Applications},
	issue_date = {October 2018},
	volume = {181},
	number = {23},
	month = {Oct},
	year = {2018},
	issn = {0975-8887},
	pages = {14-21},
	numpages = {8},
	url = {http://www.ijcaonline.org/archives/volume181/number23/30025-2018917999},
	doi = {10.5120/ijca2018917999},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Mining of big data brings out hidden knowledge that medium size and sample data cannot reveal. This research analyzed Nigeria Population Census data in order to bring forth knowledge that can aid Government in social-economic decision-making. Thus, k-means algorithm, which is an unsupervised learning technique, was implemented on MapReduce with the aim of discovering knowledge from Priority Table IX of Nigeria Census Data of 2005. MapReduce was used to aid k-means computational challenges such as Euclidean distance computation, minimum sum of square error (MSSE) computation and global objective computation effectively. The big data analytics revealed local government areas that need Government Intervention in terms of low cost housing and those local governments that need urban restructuring for good distribution of population. Further work can be done by implementing other data such as malaria data of children to reveal hidden pattern and knowledge.

References

  1. Jianqing, F., Fang, H. and Han, L. (2013). "Challenges of Big Data Analysis." arXiv, 2013, pp. 1-38.
  2. Dijcks, J.-P. (2013). Oracle: Big Data for the Enterprise. Carlifonia: Oracle Corporation.
  3. Michael B. and Gordon L. (2004). Data Mining Techniques For Marketing Sales And Customer Relationship Manager. Indianapolis, Indiana : Wiley.
  4. Kumara, J., Millsa, R.T., Hoffmana, F.M., and Hargrove, W.W. (2011). "Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets." Elsevier 1602–1611.
  5. Abdous, M., He, W., and Yen, C. (2012). " Using Data Mining for Predicting Relationships between Online Question Theme and Final Grade." Educational Technology & Society 77-88.
  6. Ramesh V., Parkavi P., and Ramar K. (2013). Predicting Student Performance: A Statistical and Data Mining Approach . International Journal of Computer Applications , 35-39.
  7. Singh, H. (2016). "Clustering of text documents by implementation of K-means algorithms" Streamed Info-Ocean 54-63.
  8. Bansal, A., Shama, M., and Goel S. (2017).Improved K-mean Clustering Algorithm for Prediction Analysis using Classification Technique in Data Mining. International Journal of Computer Applications, 35-40.
  9. Ghodsi R, Marani SB., and Keramati A. (2017). Application of K-Means Technique in Data Mining to Cluster Hemodialysis Patients. International Robotics & Automation Journal , 1-6.
  10. Zhao, W., Ma, H., and He, Q. 2009. "Parallel K-Means Clustering Based on MapReduce." CloudCom. Heidelberg: Springer-Verlag. 674-679.

Keywords

k-means, MapReduce, Euclidean distance, MSSE, Global Objective Function.