CFP last date
20 May 2024
Reseach Article

A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm

by Naresh Kumar Nagwani, Shrish Verma
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 17 - Number 2
Year of Publication: 2011
Authors: Naresh Kumar Nagwani, Shrish Verma
10.5120/2190-2778

Naresh Kumar Nagwani, Shrish Verma . A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm. International Journal of Computer Applications. 17, 2 ( March 2011), 36-40. DOI=10.5120/2190-2778

@article{ 10.5120/2190-2778,
author = { Naresh Kumar Nagwani, Shrish Verma },
title = { A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { March 2011 },
volume = { 17 },
number = { 2 },
month = { March },
year = { 2011 },
issn = { 0975-8887 },
pages = { 36-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume17/number2/2190-2778/ },
doi = { 10.5120/2190-2778 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:04:36.827537+05:30
%A Naresh Kumar Nagwani
%A Shrish Verma
%T A Frequent Term and Semantic Similarity based Single Document Text Summarization Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 17
%N 2
%P 36-40
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text summarization is an important activity in the analysis of a high volume text documents. Text summarization has number of applications; recently number of applications uses text summarization for the betterment of the text analysis and knowledge representation. In this paper a frequent term based text summarization algorithm is designed and implemented in java. The designed algorithm works in three steps. In the first step the document which is required to be summarized is processed by eliminating the stop word and by applying the stemmers. In the second step term-frequent data is calculated from the document and frequent terms are selected, for these selected words the semantic equivalent terms are also generated. Finally in the third step all the sentences in the document, which are containing the frequent and semantic equivalent terms, are filtered for summarization. The designed algorithm is implemented using open source technologies like java, DISCO, Porters stemmer etc. and verified over the standard text mining corpus.

References
  1. Alkesh Patel, Tanveer Siddiqui, U. S. Tiwary, "A language independent approach to multilingual text summarization", Conference RIAO2007, Pittsburgh PA, U.S.A., (2007).
  2. DISCO (extracting DIstributionally related words using CO-occurrences) - http://www.linguatools.de/disco/disco_en.html the British National Corpus (BNC)
  3. George Giannakopoulos, Vangelis Karkaletsis, George Vouros, "Summarization Evaluation Under an N-Gram Graph Perspective. In View of Combined Evaluation Measures.", TAC2008, (2008).
  4. Goldstein J., Kantrowitz M., MittalV., Carbonell J.: Summarizing Text Documents: Sentence Selection and Evaluation Metrics. Proceedings of the 22th ACM SIGIR, 121–127, (1999).
  5. J. Kupiec, J. Pedersen, and F. Chen, "A trainable document summarizer", In Proceedings of the 18th ACMSIGIR Conference, pages 68-73, (1995).
  6. Java, The programming language - http://www.oracle.com/technetwork/java/index.html
  7. Jing H.: Summary generation through intelligent cutting and pasting of the input document. Technical Report Columbia University, (1998).
  8. Mani, I., Automatic Summarization, John Benjamins Publishing Co. (2001) 1-22.
  9. Massih R. Amini, Nicolas Usunier, and Patrick Gallinari, "Automatic Text Summarization Based onWord-Clusters and Ranking Algorithms", ECIR 2005, LNCS 3408, pp. 142–156, (2005).
  10. Mitra M., Singhal A., Buckley C.: Automatic Text Summarization by Paragraph Extraction. Proceedings of theACL’97/EACL’97Workshop on Intelligent Scalable Text Summarization, pp. 31–36 (1997).
  11. Naresh Kumar Nagwani, Pradeep Singh, "Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs", ACM ICAC3 '09 Proceedings of the International Conference on Advances in Computing, Communication and Control, pp. 202-207, (2009).
  12. Rafeeq Al-Hashemi, "Text Summarization Extraction System (TSES) Using Extracted Keywords", International Arab Journal of e-Technology, Vol. 1, No. 4, June, pp. 164-168, (2010).
  13. René Arnulfo García-Hernández, Yulia Ledeneva, "Word Sequence Models for Single Text Summarization", 2009 Second International Conferences on Advances in Computer-Human Interactions, (2009).
  14. Sparck-Jones K.: Discourse modeling for automatic summarizing. Technical Report 29D, Computer Laboratory, university of Cambridge, (1993).
  15. Strzalkowski T., Wang J., Wise B.: A Robust practical text summarization system. Proceedings of the Fifteenth National Conferences on AI pp. 26–30 (1998).
  16. TIPSTER Text Summarization Evaluation Conference (SUMMAC) - http://www-nlpir.nist.gov/related_projects/tipster_summac/cmp_lg.html
  17. Vishal Gupta, Gurpreet Singh Lehal, "A Survey of Text Summarization Extractive Techniques", Journal of Emerging Technologies in Web Intelligence, VOL. 2, NO. 3, pp. 258-268, AUGUST (2010).
  18. Weka (a collection of machine learning algorithms for data mining tasks) - http://www.cs.waikato.ac.nz/~ml/weka.
  19. Wooncheol Jung, Youngjoong Ko, and Jungyun Seo, "Automatic Text Summarization Using Two-Step Sentence Extraction", AIRS 2004, LNCS 3411, pp. 71 – 81, (2005).
  20. Yulia Ledeneva, Alexander Gelbukh, and René Arnulfo García-Hernández, "Terms Derived from Frequent Sequences for Extractive Text Summarization", CICLing 2008, LNCS 4919, pp. 593–604, (2008).
  21. Zechner K.: Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences. COLING, 986–989, (1996).
Index Terms

Computer Science
Information Sciences

Keywords

Text Summarization Frequent Words Semantic Similar Summarization Algorithm