Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

SSM-DENCLUE : Enhanced Approach for Clustering of Sequential Data: Experiments and Test Cases

Print
PDF
International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 96 - Number 6
Year of Publication: 2014
Authors:
K. Santhi Sree
10.5120/16796-6506

Santhi K Sree. Article: SSM-DENCLUE : Enhanced Approach for Clustering of Sequential Data: Experiments and Test Cases. International Journal of Computer Applications 96(6):7-13, June 2014. Full text available. BibTeX

@article{key:article,
	author = {K. Santhi Sree},
	title = {Article: SSM-DENCLUE : Enhanced Approach for Clustering of Sequential Data: Experiments and Test Cases},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {96},
	number = {6},
	pages = {7-13},
	month = {June},
	note = {Full text available}
}

Abstract

Clustering web usage data is useful to discover interesting patterns related to user traversals, behavior and their characteristics, which helps for the improvement of better Search Engines and Web personalization. Clustering web sessions is to group them based on similarity and consists of minimizing the Intra-cluster similarity and maximizing the Inter-group similarity. The other issue that arises is how to measure similarity between web sessions. There exist multiple similarity measures in the past like Euclidean , Jaccard ,Cosine and many. Most of the similarity measures presented in the history deal only with sequence data but not the order of occurrence of data. A novel similarity measure named SSM(Sequence Similarity Measure) is developed that shows the impact of clustering process ,when both sequence and content information is incorporated while computing similarity between sequences. SSM (Sequence Similarity measure) captures both the order of occurrence of page visits and the page information as well , and compared the results with Euclidean, Jaccard and Cosine similarity measures. Incorporating a new similarity measure, the existing Density clustering technique DENCLUE is enhanced and the new named as SSM-DENCLUE for Web personalization. The Inter-cluster and Intra-cluster distances are computed using Average Levensthien distance (ALD) to demonstrate the usefulness of the proposed approach in the context of web usage mining. This new similarity measure has significant results when comparing similarities between web sessions with other previous measures , and provided good time requirements of the newly developed SSM- DENCLUE algorithms. Experiments are performed on MSNBC. COM website ( free online news channel), in the context of Density based clustering in the domain of Web usage mining.

References

  • Aggarwal. C, Han. J, Wang. J, Yu. P. S, "A Framework for Projected Clustering of High Dimensional Data Streams", Proc. 2004 Int. Conf. on Very Large Data Bases, Toronto, Canada, pp. (852-863), 2004.
  • Aoying. Z, Shuigeng. Z, "Approaches for scaling DBSCAN algorithm to large spatial database", Journal of Computer Science and Technology, Vol 15(6), pp. (509–526), 2000.
  • Chen Song-Yu, O'Grady2,O'Hare, Wei Wang, "A Clustering Algorithm Incorporating Density and Direction", IAWTAC ,IEEE 2008. Deepak P, Shourya Roy IBM India Research Lab, OPTICS on Text Data: Experiments and Test Results.
  • Cooley. R,Mobasher. B,Srivastava. J, "Web mining: Information and pattern discovery on the world wide web", 9th IEEE Int. Conf. Tools AI .
  • Guha. s, Mishra. n, Motwani. r, Callaghan. l, " Clustering data streams". In Proceedings of Computer Science. IEEE,November vol. 16(10),pp(1391-1399), 2000.
  • Santhisree, Dr A. Damodaram, 'SSM-DBSCAN and SSM-OPTICS : Incorporating a new similarity measure for Density based Clustering of Web usage data". International Journal on Computer Science and Engineering (IJCSE),Vol. 3(9),PP. (3170-3184)September 2011,India.