CFP last date
22 April 2024
Reseach Article

Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization

Published on November 2011 by Ansamma John, Dr M Wilscy
International Conference on Web Services Computing
Foundation of Computer Science USA
ICWSC - Number 1
November 2011
Authors: Ansamma John, Dr M Wilscy
a388628d-92d4-46cf-a2e7-a4b894ff4d06

Ansamma John, Dr M Wilscy . Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization. International Conference on Web Services Computing. ICWSC, 1 (November 2011), 30-38.

@article{
author = { Ansamma John, Dr M Wilscy },
title = { Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization },
journal = { International Conference on Web Services Computing },
issue_date = { November 2011 },
volume = { ICWSC },
number = { 1 },
month = { November },
year = { 2011 },
issn = 0975-8887,
pages = { 30-38 },
numpages = 9,
url = { /proceedings/icwsc/number1/3974-wsc007/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Web Services Computing
%A Ansamma John
%A Dr M Wilscy
%T Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization
%J International Conference on Web Services Computing
%@ 0975-8887
%V ICWSC
%N 1
%P 30-38
%D 2011
%I International Journal of Computer Applications
Abstract

Due to the availability of required information in the web, as multiple documents, the need for summarizing these multiple documents and ordering of the sentences in the summary in an efficient way become a relevant task in data mining. We present a novel sentence ordering method based on maximum cost spanning tree algorithm to improve the readability and cohesion of the summary obtained by extraction method from related multiple documents. It is based on extracting candidate sentences for the summary from multiple documents by ranking the sentences using cosine similarity measure and reducing the redundancy in the summary by Maximal Marginal Relevance (MMR) technique. Sentences in the summary are organized by constructing a graph where each sentence represents nodes of graph and edges are maintained between every pair of vertices which represents the similarity between the sentences. Most important task of our work is to find the first sentence to be placed in the ordered summary, by identifying the sentence which has minimum similarity with the sentences in the extracted summary. Ordering of remaining sentences in the summary is fixed one by one using Prim’s Maximum Cost Spanning tree algorithm. The proposed algorithm is tested with DUC 2002 data set and found that summary generated after ordering has better readability and cohesion than that generated without ordering. It is noted that results are more impressive as the summary size increases.

References
  1. Barzilay. R. Elhadad. N and Kathleen R. McKeown . 2002 Inferring Strategies for Sentence Ordering in Multidocument News summarization . In Journal of Artificial Intelligence Research , vol.17, pp.35-55, 2002 .
  2. Liang Zhou, Miruna Ticrea and Eduard Hovy, 2004 Multidocument Biography Summarization, In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2004), Barcelona, Spain, July 25-26, 2004.
  3. Shiyan Ou, Christopher S.G , Khoo and Dion H. Goh, 2008 Design and development of a concept-based Multidocument system for research abstract, In Journal of Information Science, vol. 34, no. 3, pp. 308-326, June 2008
  4. You Oungag , Wenji Li, Qin Lu, 2009, An integrated multi-document summarization approach based on word Hierarchical Representation, In proceedings of the ACL- IJCNLP, Singapore, pp 109-112, 2009
  5. Jade Goldstein, Vibhu Mittal, Jaime G. and Carbonell, ,2000, Multi document Summarization by Sentence Extraction, In NAACL-ANLP 2000 Workshop on Automatic Summarization (Seattle, Washington). Vol. 4, pp 40-48, 2000.),
  6. R. Michalcea., 2004, Graph-based ranking algorithm for sentence extraction , applied to text summarization, In Proceedings of the 42nd Annual Meeting of Association for Computational Linguistics (ACL) Barcelona, Spain.
  7. Lovins, J.B, 1968, Development of a stemming algorithm In Mechanical Translation and Computational Linguistic, Vol 11, pp 22-31, 1968.
  8. R. Mihalcea, P. Tarau, and E.Figa, 2004, PageRank on semantic networks, with application to word sense disambiguation. In Proceedings of 20st International Conference on Computational Linguistics (COLING), Geneva, Switzerland.
  9. R. Mihalcea, P. Tarau , 2004, TextRank: Bringing order into texts In Proceedings of EMNLP , Vol 4, Issue 4, pp 404-411 Barcelona: ACL (2004)
  10. Jiaming Zhanb, Han Tong Lohb and Ying Liua., 2009, On macro and micro-level information in multiple documents and its influence on summarization. In International Journal of Information Management, Vol. 29, No. 1, pp. 57-66, February 2009.
  11. Lin Zhao, Lide Wua and Xuanjing Huang, Using query expansion in graph-based approach for query-focused multidocument summarization, In Information Processing and Management . Vol. 45, No.1, pp. 35-41,January 2009
  12. Lovins .J.B . , Development of stemming algorithm In Mechanical Translation and Computational Linguistics vol 11 pp 22-31 1968.
Index Terms

Computer Science
Information Sciences

Keywords

Cosine similarity Maximal Marginal Relevance Spanning Tree Prim’s algorithm