A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm

Naresh Kumar Nagwani; Shrish Verma

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper

Know more

The week's pick

Enhancing Privacy Preservation: Multi-Attribute Protection with P-Sensitive K-Anonymity

Twinkle Patel Kiran Amin

Random Articles

Process Optimization Time for a Service in 4G Network by SNMP Monitoring and IaaS Cloud Computing

August

2013

An Implementation and Comparative Analysis of PID Controller and their Auto Tuning Method for Three Tank Liquid Level Control

May

2011

Towards Standardization of Deregulated Electricity Market Communications in Nigeria

November

2015

An Analysis of Wide-Area Networks

Oct

2016

Reseach Article

A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm

by Naresh Kumar Nagwani, Shrish Verma

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 17 - Number 2

Year of Publication: 2011

Authors: Naresh Kumar Nagwani, Shrish Verma

10.5120/2190-2778

Naresh Kumar Nagwani, Shrish Verma . A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm. International Journal of Computer Applications. 17, 2 ( March 2011), 36-40. DOI=10.5120/2190-2778

@article{ 10.5120/2190-2778,

author = { Naresh Kumar Nagwani, Shrish Verma },

title = { A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm },

journal = { International Journal of Computer Applications },

issue_date = { March 2011 },

volume = { 17 },

number = { 2 },

month = { March },

year = { 2011 },

issn = { 0975-8887 },

pages = { 36-40 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume17/number2/2190-2778/ },

doi = { 10.5120/2190-2778 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:04:36.827537+05:30

%A Naresh Kumar Nagwani

%A Shrish Verma

%T A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm

%J International Journal of Computer Applications

%@ 0975-8887

%V 17

%N 2

%P 36-40

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Text summarization is an important activity in the analysis of a high volume text documents. Text summarization has number of applications; recently number of applications uses text summarization for the betterment of the text analysis and knowledge representation. In this paper a frequent term based text summarization algorithm is designed and implemented in java. The designed algorithm works in three steps. In the first step the document which is required to be summarized is processed by eliminating the stop word and by applying the stemmers. In the second step term-frequent data is calculated from the document and frequent terms are selected, for these selected words the semantic equivalent terms are also generated. Finally in the third step all the sentences in the document, which are containing the frequent and semantic equivalent terms, are filtered for summarization. The designed algorithm is implemented using open source technologies like java, DISCO, Porters stemmer etc. and verified over the standard text mining corpus.

References

Alkesh Patel, Tanveer Siddiqui, U. S. Tiwary, "A language independent approach to multilingual text summarization", Conference RIAO2007, Pittsburgh PA, U.S.A., (2007).
DISCO (extracting DIstributionally related words using CO-occurrences) - http://www.linguatools.de/disco/disco_en.html the British National Corpus (BNC)
George Giannakopoulos, Vangelis Karkaletsis, George Vouros, "Summarization Evaluation Under an N-Gram Graph Perspective. In View of Combined Evaluation Measures.", TAC2008, (2008).
Goldstein J., Kantrowitz M., MittalV., Carbonell J.: Summarizing Text Documents: Sentence Selection and Evaluation Metrics. Proceedings of the 22th ACM SIGIR, 121–127, (1999).
J. Kupiec, J. Pedersen, and F. Chen, "A trainable document summarizer", In Proceedings of the 18th ACMSIGIR Conference, pages 68-73, (1995).
Java, The programming language - http://www.oracle.com/technetwork/java/index.html
Jing H.: Summary generation through intelligent cutting and pasting of the input document. Technical Report Columbia University, (1998).
Mani, I., Automatic Summarization, John Benjamins Publishing Co. (2001) 1-22.
Massih R. Amini, Nicolas Usunier, and Patrick Gallinari, "Automatic Text Summarization Based onWord-Clusters and Ranking Algorithms", ECIR 2005, LNCS 3408, pp. 142–156, (2005).
Mitra M., Singhal A., Buckley C.: Automatic Text Summarization by Paragraph Extraction. Proceedings of theACL’97/EACL’97Workshop on Intelligent Scalable Text Summarization, pp. 31–36 (1997).
Naresh Kumar Nagwani, Pradeep Singh, "Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs", ACM ICAC3 '09 Proceedings of the International Conference on Advances in Computing, Communication and Control, pp. 202-207, (2009).
Rafeeq Al-Hashemi, "Text Summarization Extraction System (TSES) Using Extracted Keywords", International Arab Journal of e-Technology, Vol. 1, No. 4, June, pp. 164-168, (2010).
René Arnulfo García-Hernández, Yulia Ledeneva, "Word Sequence Models for Single Text Summarization", 2009 Second International Conferences on Advances in Computer-Human Interactions, (2009).
Sparck-Jones K.: Discourse modeling for automatic summarizing. Technical Report 29D, Computer Laboratory, university of Cambridge, (1993).
Strzalkowski T., Wang J., Wise B.: A Robust practical text summarization system. Proceedings of the Fifteenth National Conferences on AI pp. 26–30 (1998).
TIPSTER Text Summarization Evaluation Conference (SUMMAC) - http://www-nlpir.nist.gov/related_projects/tipster_summac/cmp_lg.html
Vishal Gupta, Gurpreet Singh Lehal, "A Survey of Text Summarization Extractive Techniques", Journal of Emerging Technologies in Web Intelligence, VOL. 2, NO. 3, pp. 258-268, AUGUST (2010).
Weka (a collection of machine learning algorithms for data mining tasks) - http://www.cs.waikato.ac.nz/~ml/weka.
Wooncheol Jung, Youngjoong Ko, and Jungyun Seo, "Automatic Text Summarization Using Two-Step Sentence Extraction", AIRS 2004, LNCS 3411, pp. 71 – 81, (2005).
Yulia Ledeneva, Alexander Gelbukh, and René Arnulfo García-Hernández, "Terms Derived from Frequent Sequences for Extractive Text Summarization", CICLing 2008, LNCS 4919, pp. 593–604, (2008).
Zechner K.: Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences. COLING, 986–989, (1996).

Index Terms

Computer Science

Information Sciences

Keywords

Text Summarization Frequent Words Semantic Similar Summarization Algorithm