CFP last date
20 May 2024
Reseach Article

Exploratory Implementation of Stream Clustering Algorithm using MongoDB

by Jyotsna Talreja Wassan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 121 - Number 10
Year of Publication: 2015
Authors: Jyotsna Talreja Wassan
10.5120/21577-4636

Jyotsna Talreja Wassan . Exploratory Implementation of Stream Clustering Algorithm using MongoDB. International Journal of Computer Applications. 121, 10 ( July 2015), 21-29. DOI=10.5120/21577-4636

@article{ 10.5120/21577-4636,
author = { Jyotsna Talreja Wassan },
title = { Exploratory Implementation of Stream Clustering Algorithm using MongoDB },
journal = { International Journal of Computer Applications },
issue_date = { July 2015 },
volume = { 121 },
number = { 10 },
month = { July },
year = { 2015 },
issn = { 0975-8887 },
pages = { 21-29 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume121/number10/21577-4636/ },
doi = { 10.5120/21577-4636 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:08:05.793790+05:30
%A Jyotsna Talreja Wassan
%T Exploratory Implementation of Stream Clustering Algorithm using MongoDB
%J International Journal of Computer Applications
%@ 0975-8887
%V 121
%N 10
%P 21-29
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In the recent years, Big Data has become ubiquitous and various big data tools are greatly in use to accelerate the computing and analytics in various fields. Various algorithms in Computer Science use large and heterogeneous data sets; and hence could be explored with Big Data platforms. One such class of algorithms is stream clustering algorithms; dealing with large scale processing of incremental data. This motivation of using Big Data tools may lead to improved efficacy of running the algorithms. Hadoop, the most popular open source implementation of MapReduce, has been utilized and modified for catering the needs of numerous clustering problems. But various scientific and computing fields are also using MongoDB, a document oriented NoSQL store supporting Map Reduce. The main purpose of this paper is to try and judge the usage of MongoDB as a Big Data platform for implementing a stream clustering algorithm using MapReduce programming model to study the factors relating Map Reduce and MongoDB together.

References
  1. Aggarwal, C. C. , Han, J. , Wang, J. , & Yu, P. S. (2003, September). A framework for clustering evolving data streams. In Proceedings of the 29th international conference on Very large data bases-Volume 29 (pp. 81-92). VLDB Endowment.
  2. Aggarwal, C. C. , Han, J. , Wang, J. , & Yu, P. S. (2004, August). A framework for projected clustering of high dimensional data streams. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp. 852-863). VLDB Endowment.
  3. Antonellis, P. , Makris, C. , & Tsirakis, N. (2009). Algorithms for clustering clickstream data. Information Processing Letters, 109(8), 381-385.
  4. Bhatnagar, V. , & Kaur, S. (2007, January). Exclusive and complete clustering of streams. In Database and Expert Systems Applications (pp. 629-638). Springer Berlin Heidelberg.
  5. Bhatnagar, V. , Kaur, S. , & Chakravarthy, S. (2014). Clustering data streams using grid-based synopsis. Knowledge and information systems, 41(1), 127-152.
  6. Bifet, A. , Holmes, G. , Pfahringer, B. , Kranen, P. , Kremer, H. , Jansen, T. , & Seidl, T. (2010). MOA: Massive Online Analysis, a framework for stream classification and clustering.
  7. Bryant, R. E. (2011). Data-intensive scalable computing for scientific applications. Computing in Science & Engineering, 13(6), 25-33.
  8. Chen, C. P. , & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences, 275, 314-347.
  9. Chodorow, K. (2013). MongoDB: the definitive guide. " O'Reilly Media, Inc. ".
  10. Dean, J. , & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
  11. Gaber, M. M. (2012). Advances in data stream mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 79-85.
  12. Gao, J. , Li, J. , Zhang, Z. , & Tan, P. N. (2005). An incremental data stream clustering algorithm based on dense units detection. In Advances in Knowledge Discovery and Data Mining (pp. 420-425). Springer Berlin Heidelberg.
  13. Kanoje, S. , Powar, V. , & Mukhopadhyay, D. (2015). Using MongoDB for Social Networking Website. arXiv preprint arXiv:1503. 06548.
  14. KDD CUP 99 Intrusion Data: http://kdd. ics. uci. edu//databases/kddcup99/kddcup99. html.
  15. Lin, J. , & Lin, H. (2009, August). A density-based clustering over evolving heterogeneous data stream. In Computing, Communication, Control, and Management, 2009. CCCM 2009. ISECS International Colloquium on (Vol. 4, pp. 275-277). IEEE.
  16. Lu, Y. , Sun, Y. , Xu, G. , & Liu, G. (2005). A grid-based clustering algorithm for high-dimensional data streams. In Advanced Data Mining and Applications (pp. 824-831). Springer Berlin Heidelberg.
  17. Marr, B. (Feb 2014). A Talk on Big Data- the 5 Vs Everyone must know.
  18. Marz, N. , & Warren, J. (2015). Big Data: Principles and best practices of scalable realtime data systems. Manning Publications Co.
  19. McAfee, A. , Brynjolfsson, E. , Davenport, T. H. , Patil, D. J. , & Barton, D. (2012). Big data. The management revolution. Harvard Bus Rev, 90(10), 61-67.
  20. MongoDB Documentation Retrieved May 2015. From http://www. mongodb. org/
  21. NoSQL Databases Retrieved May 2015. From http://nosql-database. org/
  22. Seguin, K. (2011). The Little MongoDB Book.
  23. Strauch, C. , Sites, U. L. S. , & Kriha, W. (2011). NoSQL databases. Lecture Notes, Stuttgart Media University.
  24. Sun, Z. (2013, November). A parallel clustering method study based on mapReduce. In 1st International Workshop on Cloud Computing and Information Security. Atlantis Press.
Index Terms

Computer Science
Information Sciences

Keywords

Big Data MapReduce Sharding Clustering Grid