CFP last date
22 April 2024
Reseach Article

An Efficient Technique to Improve Snippet Clustering and Labeling using Modified FPF Algorithm

Published on August 2012 by M. Hanumanthappa, B R Prakash
National Conference on Advanced Computing and Communications 2012
Foundation of Computer Science USA
NCACC - Number 1
August 2012
Authors: M. Hanumanthappa, B R Prakash
61515d51-ad87-4ba8-9e88-c54d1817d88b

M. Hanumanthappa, B R Prakash . An Efficient Technique to Improve Snippet Clustering and Labeling using Modified FPF Algorithm. National Conference on Advanced Computing and Communications 2012. NCACC, 1 (August 2012), 38-42.

@article{
author = { M. Hanumanthappa, B R Prakash },
title = { An Efficient Technique to Improve Snippet Clustering and Labeling using Modified FPF Algorithm },
journal = { National Conference on Advanced Computing and Communications 2012 },
issue_date = { August 2012 },
volume = { NCACC },
number = { 1 },
month = { August },
year = { 2012 },
issn = 0975-8887,
pages = { 38-42 },
numpages = 5,
url = { /proceedings/ncacc/number1/7995-1011/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Advanced Computing and Communications 2012
%A M. Hanumanthappa
%A B R Prakash
%T An Efficient Technique to Improve Snippet Clustering and Labeling using Modified FPF Algorithm
%J National Conference on Advanced Computing and Communications 2012
%@ 0975-8887
%V NCACC
%N 1
%P 38-42
%D 2012
%I International Journal of Computer Applications
Abstract

Document clustering is an effective tool to manage information overload. By grouping similardocuments together, we enable a human observer to quickly browse large document collections,make it possible to easily grasp the distinct topics and subtopics. In this Paper we survey the most important problems and techniques relatedto text information retrieval: document pre-processing and filtering, word sense disambiguation,Further we present text clustering using Modified FPF algorithm and comparison of our clustering algorithms against FPF, which isthe most used algorithm in the text clustering context. Further we introducethe problem of cluster labeling: Cluster labeling is achieved by combining intra-clusterand inter-cluster term extraction based on a variant of the informationgain measure.

References
  1. Geraci, F. , Pellegrini, M. , Sebastiani, F. , Maggini, M. : Cluster generation and cluster labeling for web snippets: Aast and accurate hierarchical solution. TechnicalReport IIT TR-1/2006, Institute for Informatics and Telematics of CNR (2006)
  2. Nearest-neighbor searching and metricspace dimensions. In Gregory Shakhnarovich, Trevor Darrell, and PiotrIndyk,editors, Nearest-Neighbor Methods for Learning and Vision: Theory and Practice,pages 15–59. MIT Press, 2006.
  3. FlavioChierichetti, Alessandro Panconesi, PrabhakarRaghavan, Mauro Sozio, Alessandro Tiberi, and Eli Upfal. Finding near neighborsthrough cluster pruning. In Proceedings of ACM PODS, 2008.
  4. Paolo Ferragina and Antonio Gulli. A personalized search engine based on Web-snippet hierarchical clustering. In Special Interest Tracks and Poster Proceedings of WWW-05, 14th International Conference on the World Wide Web, pages 801–810, Chiba, JP, 2006.
  5. Karina Figueroa, Edgar Ch´avez, Gonzalo Navarro, and Rodrigo Paredes. On the least cost for proximity searching in metric spaces. In 5th International Workshop on Experimental Algorithms (WEA), volume 4007 of Lecture Notes in Computer Science, pages 279–290. Springer, 2006.
  6. M. Furini, F. Geraci, M. Montangero, and M. Pellegrini. VISTO: VIsual Storyboard forWeb Video Browsing. In CIVR '07: Proceedings of the ACM International Conference on Image and Video Retrieval, July 2007.
  7. F. Geraci, M. Pellegrini, F. Sebastiani, and M. Maggini. Cluster generation and cluster labelling for web snippets. In Proceedings of the 13th Symposium on String Processing and Information Retrieval (SPIRE 2006), pages 25–36, Glasgow, UK. , October 2006. Volume 4209 in LNCS.
  8. FilippoGeraci, Marco Pellegrini, Paolo Pisati, and FabrizioSebastiani. A scalable algorithm for high-quality clustering of Web snippets. In Proceedings of SAC-06, 21st ACM Symposium on Applied Computing, pages 1058–1062, Dijon, FR, 2007.
  9. FilippoGeraci, Mauro Leoncini, Manuela Montangero, Marco Pellegrini, and M. Elena Renda. Fpf-sb: a scalable algorithm for microarray gene expression data clustering. In Proceedings of 1st InternationalConference on Digital Human Modeling, 2008.
  10. Stanislaw Osinski and Dawid Weiss. Conceptual clustering using Lingo algorithm: Evaluation on Open Directory Project data. In Proceedings of IIPWM-04, 5th Conference on Intelligent Information Processing and Web Mining, pages 369–377, Zakopane, PL, 2004.
  11. D. Crabtree, X. Gao, and P. Andreae, "Standardized evaluation method for web clustering results," in Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, 2005.
  12. Montserrat Mateos Sánchez, EncarnaciónBeatoGutiérrez,RobertoBerjón Gallinas, Ana Mª FermosoGarcía, Miguel Angel Sánchez Vidales CLUSTERING OF WEB DOCUMENTS:FULL-TEXT OR SNIPPET? IADIS International Conference WWW/Internet 2008, pp488-493. ISBN: 978-972-8924-68-3 © 2008 IADIS.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Document Clustering Cluster Labeling Information Retrieval