Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Analysing and Implementing the Mobility over MANETS using Random Way Point Model

April

2013

Issues Related to Transit Network Design Problem

June

2015

Neural-Fuzzy Approach for Power Load Forecasting Analysis

May

2013

A Comprehensive Survey on Online Anomaly Detection

June

2015

Reseach Article

Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization

Published on November 2011 by Ansamma John, Dr M Wilscy

International Conference on Web Services Computing

Foundation of Computer Science USA

ICWSC - Number 1

November 2011

Authors: Ansamma John, Dr M Wilscy

Ansamma John, Dr M Wilscy . Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization. International Conference on Web Services Computing. ICWSC, 1 (November 2011), 30-38.

@article{

author = { Ansamma John, Dr M Wilscy },

title = { Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization },

journal = { International Conference on Web Services Computing },

issue_date = { November 2011 },

volume = { ICWSC },

number = { 1 },

month = { November },

year = { 2011 },

issn = 0975-8887,

pages = { 30-38 },

numpages = 9,

url = { /proceedings/icwsc/number1/3974-wsc007/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference on Web Services Computing

%A Ansamma John

%A Dr M Wilscy

%T Content based Sentence Ordering using Spanning Tree Algorithm for Improved Multi Document Summarization

%J International Conference on Web Services Computing

%@ 0975-8887

%V ICWSC

%N 1

%P 30-38

%D 2011

%I International Journal of Computer Applications

Abstract

Due to the availability of required information in the web, as multiple documents, the need for summarizing these multiple documents and ordering of the sentences in the summary in an efficient way become a relevant task in data mining. We present a novel sentence ordering method based on maximum cost spanning tree algorithm to improve the readability and cohesion of the summary obtained by extraction method from related multiple documents. It is based on extracting candidate sentences for the summary from multiple documents by ranking the sentences using cosine similarity measure and reducing the redundancy in the summary by Maximal Marginal Relevance (MMR) technique. Sentences in the summary are organized by constructing a graph where each sentence represents nodes of graph and edges are maintained between every pair of vertices which represents the similarity between the sentences. Most important task of our work is to find the first sentence to be placed in the ordered summary, by identifying the sentence which has minimum similarity with the sentences in the extracted summary. Ordering of remaining sentences in the summary is fixed one by one using Primâs Maximum Cost Spanning tree algorithm. The proposed algorithm is tested with DUC 2002 data set and found that summary generated after ordering has better readability and cohesion than that generated without ordering. It is noted that results are more impressive as the summary size increases.

References

Barzilay. R. Elhadad. N and Kathleen R. McKeown . 2002 Inferring Strategies for Sentence Ordering in Multidocument News summarization . In Journal of Artificial Intelligence Research , vol.17, pp.35-55, 2002 .
Liang Zhou, Miruna Ticrea and Eduard Hovy, 2004 Multidocument Biography Summarization, In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP-2004), Barcelona, Spain, July 25-26, 2004.
Shiyan Ou, Christopher S.G , Khoo and Dion H. Goh, 2008 Design and development of a concept-based Multidocument system for research abstract, In Journal of Information Science, vol. 34, no. 3, pp. 308-326, June 2008
You Oungag , Wenji Li, Qin Lu, 2009, An integrated multi-document summarization approach based on word Hierarchical Representation, In proceedings of the ACL- IJCNLP, Singapore, pp 109-112, 2009
Jade Goldstein, Vibhu Mittal, Jaime G. and Carbonell, ,2000, Multi document Summarization by Sentence Extraction, In NAACL-ANLP 2000 Workshop on Automatic Summarization (Seattle, Washington). Vol. 4, pp 40-48, 2000.),
R. Michalcea., 2004, Graph-based ranking algorithm for sentence extraction , applied to text summarization, In Proceedings of the 42nd Annual Meeting of Association for Computational Linguistics (ACL) Barcelona, Spain.
Lovins, J.B, 1968, Development of a stemming algorithm In Mechanical Translation and Computational Linguistic, Vol 11, pp 22-31, 1968.
R. Mihalcea, P. Tarau, and E.Figa, 2004, PageRank on semantic networks, with application to word sense disambiguation. In Proceedings of 20st International Conference on Computational Linguistics (COLING), Geneva, Switzerland.
R. Mihalcea, P. Tarau , 2004, TextRank: Bringing order into texts In Proceedings of EMNLP , Vol 4, Issue 4, pp 404-411 Barcelona: ACL (2004)
Jiaming Zhanb, Han Tong Lohb and Ying Liua., 2009, On macro and micro-level information in multiple documents and its influence on summarization. In International Journal of Information Management, Vol. 29, No. 1, pp. 57-66, February 2009.
Lin Zhao, Lide Wua and Xuanjing Huang, Using query expansion in graph-based approach for query-focused multidocument summarization, In Information Processing and Management . Vol. 45, No.1, pp. 35-41,January 2009
Lovins .J.B . , Development of stemming algorithm In Mechanical Translation and Computational Linguistics vol 11 pp 22-31 1968.

Index Terms

Computer Science

Information Sciences

Keywords

Cosine similarity Maximal Marginal Relevance Spanning Tree Primâs algorithm