Exploratory Implementation of Stream Clustering Algorithm using MongoDB

Jyotsna Talreja Wassan

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

AI-Assisted Observability in Distributed Microservice Architectures

Kyrylo Sotnykov

Random Articles

An Evaluation of Network Topologies for Enhance Networking

Jun

2023

Semantic Web Application in Learning Resource Ontology Repository

April

2016

FRANSAC: Fast RANdom Sample Consensus for 3D Plane Segmentation

Jun

2017

Recommender Systems for Software Requirements Negotiation and Prioritization

May

2015

Reseach Article

Exploratory Implementation of Stream Clustering Algorithm using MongoDB

by Jyotsna Talreja Wassan

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 121 - Number 10

Year of Publication: 2015

Authors: Jyotsna Talreja Wassan

10.5120/21577-4636

Jyotsna Talreja Wassan . Exploratory Implementation of Stream Clustering Algorithm using MongoDB. International Journal of Computer Applications. 121, 10 ( July 2015), 21-29. DOI=10.5120/21577-4636

@article{ 10.5120/21577-4636,

author = { Jyotsna Talreja Wassan },

title = { Exploratory Implementation of Stream Clustering Algorithm using MongoDB },

journal = { International Journal of Computer Applications },

issue_date = { July 2015 },

volume = { 121 },

number = { 10 },

month = { July },

year = { 2015 },

issn = { 0975-8887 },

pages = { 21-29 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume121/number10/21577-4636/ },

doi = { 10.5120/21577-4636 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:08:05.793790+05:30

%A Jyotsna Talreja Wassan

%T Exploratory Implementation of Stream Clustering Algorithm using MongoDB

%J International Journal of Computer Applications

%@ 0975-8887

%V 121

%N 10

%P 21-29

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In the recent years, Big Data has become ubiquitous and various big data tools are greatly in use to accelerate the computing and analytics in various fields. Various algorithms in Computer Science use large and heterogeneous data sets; and hence could be explored with Big Data platforms. One such class of algorithms is stream clustering algorithms; dealing with large scale processing of incremental data. This motivation of using Big Data tools may lead to improved efficacy of running the algorithms. Hadoop, the most popular open source implementation of MapReduce, has been utilized and modified for catering the needs of numerous clustering problems. But various scientific and computing fields are also using MongoDB, a document oriented NoSQL store supporting Map Reduce. The main purpose of this paper is to try and judge the usage of MongoDB as a Big Data platform for implementing a stream clustering algorithm using MapReduce programming model to study the factors relating Map Reduce and MongoDB together.

References

Aggarwal, C. C. , Han, J. , Wang, J. , & Yu, P. S. (2003, September). A framework for clustering evolving data streams. In Proceedings of the 29th international conference on Very large data bases-Volume 29 (pp. 81-92). VLDB Endowment.
Aggarwal, C. C. , Han, J. , Wang, J. , & Yu, P. S. (2004, August). A framework for projected clustering of high dimensional data streams. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30 (pp. 852-863). VLDB Endowment.
Antonellis, P. , Makris, C. , & Tsirakis, N. (2009). Algorithms for clustering clickstream data. Information Processing Letters, 109(8), 381-385.
Bhatnagar, V. , & Kaur, S. (2007, January). Exclusive and complete clustering of streams. In Database and Expert Systems Applications (pp. 629-638). Springer Berlin Heidelberg.
Bhatnagar, V. , Kaur, S. , & Chakravarthy, S. (2014). Clustering data streams using grid-based synopsis. Knowledge and information systems, 41(1), 127-152.
Bifet, A. , Holmes, G. , Pfahringer, B. , Kranen, P. , Kremer, H. , Jansen, T. , & Seidl, T. (2010). MOA: Massive Online Analysis, a framework for stream classification and clustering.
Bryant, R. E. (2011). Data-intensive scalable computing for scientific applications. Computing in Science & Engineering, 13(6), 25-33.
Chen, C. P. , & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences, 275, 314-347.
Chodorow, K. (2013). MongoDB: the definitive guide. " O'Reilly Media, Inc. ".
Dean, J. , & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
Gaber, M. M. (2012). Advances in data stream mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 79-85.
Gao, J. , Li, J. , Zhang, Z. , & Tan, P. N. (2005). An incremental data stream clustering algorithm based on dense units detection. In Advances in Knowledge Discovery and Data Mining (pp. 420-425). Springer Berlin Heidelberg.
Kanoje, S. , Powar, V. , & Mukhopadhyay, D. (2015). Using MongoDB for Social Networking Website. arXiv preprint arXiv:1503. 06548.
KDD CUP 99 Intrusion Data: http://kdd. ics. uci. edu//databases/kddcup99/kddcup99. html.
Lin, J. , & Lin, H. (2009, August). A density-based clustering over evolving heterogeneous data stream. In Computing, Communication, Control, and Management, 2009. CCCM 2009. ISECS International Colloquium on (Vol. 4, pp. 275-277). IEEE.
Lu, Y. , Sun, Y. , Xu, G. , & Liu, G. (2005). A grid-based clustering algorithm for high-dimensional data streams. In Advanced Data Mining and Applications (pp. 824-831). Springer Berlin Heidelberg.
Marr, B. (Feb 2014). A Talk on Big Data- the 5 Vs Everyone must know.
Marz, N. , & Warren, J. (2015). Big Data: Principles and best practices of scalable realtime data systems. Manning Publications Co.
McAfee, A. , Brynjolfsson, E. , Davenport, T. H. , Patil, D. J. , & Barton, D. (2012). Big data. The management revolution. Harvard Bus Rev, 90(10), 61-67.
MongoDB Documentation Retrieved May 2015. From http://www. mongodb. org/
NoSQL Databases Retrieved May 2015. From http://nosql-database. org/
Seguin, K. (2011). The Little MongoDB Book.
Strauch, C. , Sites, U. L. S. , & Kriha, W. (2011). NoSQL databases. Lecture Notes, Stuttgart Media University.
Sun, Z. (2013, November). A parallel clustering method study based on mapReduce. In 1st International Workshop on Cloud Computing and Information Security. Atlantis Press.

Index Terms

Computer Science

Information Sciences

Keywords

Big Data MapReduce Sharding Clustering Grid