CFP last date
22 April 2024
Reseach Article

A Topic-driven Summarization using K-mean Clustering and Tf-Isf Sentence Ranking

by Rajesh Wadhvani, R. K. Pateriya, Devshri Roy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 79 - Number 8
Year of Publication: 2013
Authors: Rajesh Wadhvani, R. K. Pateriya, Devshri Roy
10.5120/13764-1608

Rajesh Wadhvani, R. K. Pateriya, Devshri Roy . A Topic-driven Summarization using K-mean Clustering and Tf-Isf Sentence Ranking. International Journal of Computer Applications. 79, 8 ( October 2013), 39-45. DOI=10.5120/13764-1608

@article{ 10.5120/13764-1608,
author = { Rajesh Wadhvani, R. K. Pateriya, Devshri Roy },
title = { A Topic-driven Summarization using K-mean Clustering and Tf-Isf Sentence Ranking },
journal = { International Journal of Computer Applications },
issue_date = { October 2013 },
volume = { 79 },
number = { 8 },
month = { October },
year = { 2013 },
issn = { 0975-8887 },
pages = { 39-45 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume79/number8/13764-1608/ },
doi = { 10.5120/13764-1608 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:52:30.981261+05:30
%A Rajesh Wadhvani
%A R. K. Pateriya
%A Devshri Roy
%T A Topic-driven Summarization using K-mean Clustering and Tf-Isf Sentence Ranking
%J International Journal of Computer Applications
%@ 0975-8887
%V 79
%N 8
%P 39-45
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Enormous online information is available due to the World Wide Web. This needed efficient and accurate summarization systems to extract significant information. Text summarization system automatically generates a summary of a given document and helps people to make effective decisions in less time. In this paper two methods have been proposed for query-focused multi-document summarization that uses k-mean clustering, term-frequency and inversesentence- frequency method for sentence weighting to rank the sentences of the documents with respect to a given query. The proposed method finds the proximity of documents and query, and later uses this proximity to rank sentences of each document. It is assumed that the document which is nearer to a query might contain more meaning full sentences with respect to the information need expressed by user's query further if a sentence contains rare query term than it is more informative than the sentences that contains frequent query term. Both methods first gives weights to documents according to their proximity and use these document weights to rank each of their sentences with tf-idf ranking function. A relative study for proposed methods has been done and experimental results shows that both methods are comparable because of a slight difference in performance. DUC 2007 test dataset and ROUGH-1. 5. 5 summarization evaluation package is used for evaluation purpose.

References
  1. Vishal Gupta, Gurpreet Singh Lehal, "A Survey of Text Summarization Extractive Techniques", Journal of Emerging Technologies in web Intelligence, Vol. 2, No. 3, 2010.
  2. K. Knight and D. Marcu, "Summarization beyond sentence extraction: a probablistic approach to sentence compression", Artefcial Intelligence, pages 91-107, 2002 Elsevier Science.
  3. Eduard Hovy, "Text Summarization", In R. Mitkov Ed. The Oxford Hand-book of Computational Linguistics, chapter 32 (2005) 583-598.
  4. D. Zajic, B. J. Dorr, J. Lin, and R. Schwartz, "Multi-candidate reduction: Sentence compression as a tool for document summarization tasks" , Inf. Process. Manage, Volume 43, pp. 1549- 1570, November 2007.
  5. H. Daume, D. Marcu, "A noisy-channel model for documentcompression", In proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Ser. ACL 02. Stroudsburg, PA, USA:Association for Computational Linguistics, pp. 449-456, 2002.
  6. Florian Wolf, Edward Gibson, "Paragraph-, word-, and coherence-based approaches to sentence ranking: a comparison of algorithm and human performance", In proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL '04, Article No. 383, 2004.
  7. Heng-Hui Liu, Yi-Ting Huang , Jung-Hsien Chiang, "A study on paragraph ranking and recommendation by topic information retrieval from biomedical literature", In proceeding of the International Conference on Computer Symposium (ICS), 2010, pp. 859-864, Dec. 2010.
  8. Laszlo Grunfeld, Kui-Lam Kwok, "Sentence Ranking Using Keywords And Meta-Keywords", Publisher Springer Netherlands, Volume 32, pp 229-258,2006.
  9. H. Saggion, K. Bontcheva, and H. Cunningham, "Robust generic and query based summarization", In proceedings EACL Conf. , pp. 235-238, 2003.
  10. J. Ge, X. Huang, and L. Wu, "Approaches to event-focused summarization based on named entities and query words", In proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 281-288, 2004.
  11. J. M. Conroy and J. D. Schlesinger, "CLASSY query-based multi-document summarization" In proceedings of the Document Understanding Conf. Wksp. 2005 (DUC 2005) at the Human Language Technology Conf. /Conf. on Empirical Methods in Natural Language Processing (HLT/EMNLP).
  12. You Ouyanga, Wenjie Lia, Qin Lua, "Applying regression models to query-focused multi-document summarization", In Information Processing and Management volume 47, issue 2, pp 227237, March 2011.
  13. DUC. Document understanding conference 2007 (2007), http://www-nlpir. nist. gov/projects/duc.
  14. Chin-Yew Lin, "ROUGE: A Package for Automatic Evaluation of Summaries", In Proceedings ofWorkshop on Text Summarization of ACL, Barcelona, Spain(2004).
Index Terms

Computer Science
Information Sciences

Keywords

Sentence Extraction Document Clustering F-score