CFP last date
20 May 2024
Reseach Article

A Feature Terms based Method for Improving Text Summarization with Supervised POS Tagging

by Suneetha Manne, S. Sameen Fatima
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 47 - Number 23
Year of Publication: 2012
Authors: Suneetha Manne, S. Sameen Fatima
10.5120/7494-0541

Suneetha Manne, S. Sameen Fatima . A Feature Terms based Method for Improving Text Summarization with Supervised POS Tagging. International Journal of Computer Applications. 47, 23 ( June 2012), 7-14. DOI=10.5120/7494-0541

@article{ 10.5120/7494-0541,
author = { Suneetha Manne, S. Sameen Fatima },
title = { A Feature Terms based Method for Improving Text Summarization with Supervised POS Tagging },
journal = { International Journal of Computer Applications },
issue_date = { June 2012 },
volume = { 47 },
number = { 23 },
month = { June },
year = { 2012 },
issn = { 0975-8887 },
pages = { 7-14 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume47/number23/7494-0541/ },
doi = { 10.5120/7494-0541 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:42:36.219906+05:30
%A Suneetha Manne
%A S. Sameen Fatima
%T A Feature Terms based Method for Improving Text Summarization with Supervised POS Tagging
%J International Journal of Computer Applications
%@ 0975-8887
%V 47
%N 23
%P 7-14
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text summarization is the process of distilling the most important information from a source to produce an abridged version for a particular user and task. When this is done by means of a computer, i. e. automatically, it calls as Automatic Text Summarization. Summarization can be classified into two approaches: extraction and abstraction. Extraction based summaries are produced by concatenating several sentences taken exactly as they appear in the texts being summarized. Abstraction based summaries are written to convey the main information in the input and may reuse phrases or clauses from it. This paper focuses on extraction approach. The goal of text summarization based on extraction approach is sentences selection. One of the methods to obtain the sentences is to assign some feature terms of sentences for the summary called ranking sentences and then select the best ones. The first step in summarization by extraction is the identification of important features. In our approach 1000 computer science related research papers are used as test documents. Each document is prepared by preprocessing process: sentence segmentation, tokenization, stop word removal, case folding, lemmatization, and stemming. Then, using important features, sentence filtering features, data compression features and finally calculating score for each sentence. The proposed text summarization is based on HMM tagger to improve the quality of the summary. Here, comparing our results with the existing summarizers which are Copernicus summarizer, Great summarizer and Microsoft Word 2007 summarizers etc. The proposed system is also tested with four types' similarities: Cosine, Jaccard, Jaro-winkler and Sorenson similarities. The results show that the best quality for the summaries was obtained by feature terms method.

References
  1. D. R. Radev and W. Fan, "Automatic summarization of search engine hit lists", Proceedings of the ACL-2000 workshop on recent advances in natural language processing and information retrieval, Hong Kong, 2000, pp. 99-109
  2. ISC "ISC Internet Domain Survey", Available at: http://ftp. isc. org/www/survey/reports/current/
  3. Sparck-Jones, K. "Automatic Summarizing: Factors and Directions" in Mani, I. And Maybury, M. , editors, Advances in Automatic Text Summarization, pp. 1-12. MIT Press, 1999
  4. C. H. Chang, M. Kayed, M. R. Girgis, and K. Shaalan, "A survey of web information extraction systems," IEEE Transactions on Knowledge and Data Engineering, Vol. 18, 2006, pp. 1411-1428
  5. Vishal Gupta and Gurpreet Singh Lehal, "A Survey of Text Summarization Extractive Techniques", Journal of Emerging Technologies In Web Intelligence, Vol. 2, No. 3, August 2010.
  6. Luhn H. P, "The Automatic Creation of Literature Abstracts", IBM Journal April 1958 pp. 159–165.
  7. Baxendale, P. (1958), 'Machine-made Index for Technical Literature –An Experiment', IBM Journal of Research Development, Vol. 2, No. 4, pp. 354-361.
  8. Edmundson H. P, " New Methods in Automatic Extracting", Journal of the Association for Computing Machinery, Vol 16, No 2, April 1969, PP. 264-285.
  9. J. J. Pollock and A. Zamora , "Automatic Abstracting Research at Chemical Abstracts Service", Journal of Chemical Information and Computer Sciences, 15(4), 226-232(1975).
  10. Lin, C. -Y. and E. Hovy (1997). Identifying topics by position. In Proceedings of the Applied Natural Language Processing Conference (ANLP-97), Washington, DC, pp. 283-290.
  11. Kathleen R. McKeown, "Discourse Strategies for Generating Natural Language Text", Department of Computer Science, Columbia University, New York, 1982
  12. Brandow, R. , Mitze, K. , Rau, L. F. Automatic condensation of electronic publications by sentence selection. Information Processing anagement,31(5):675-685, 1995.
  13. Barzilay R. , Elhadad M. , Boguraev & Kennedy M. , Using Lexical Chains for Text Summarization, Workshop on Intelligent Scalable Text Summarization, Ben Gurion University of the Negev, Be'er Sheva Israel, 1997.
  14. B K Boguraev, C Kennedy, R Bellamy, "Dynamic presentation of phrasally-based document abstractions. " 32nd International Conference on System Sciences, 1999.
  15. Marcu, D. 1999. The automatic construction of large-scale corpora for summarization research. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval, University of California, Berkeley, August.
  16. Turney. 1999. Learning to extract keyphrases from text. Teical chnReport ERB-1057. (NRC#41622), National Research Council, Institute for Information Technology.
  17. Jing, Hongyan and Kathleen McKeown. 2000. Cut and paste based text summarization. In 1st Conference of the North American Chapter of the Association for Computational Linguistics
  18. Radev, R. , Blair-goldensohn, S, Zhang, Z. Experiments in Single and Multi-Docuemtn Summarization using MEAD. In First Document Understanding Conference, New Orleans, LA, 2001.
  19. L. Bahl and R. L. Mercer, "Part-Of-Speech assignment by a statistical decision algorithm", IEEE International Symposium on Information Theory, pages: 88 - 89, 1976.
  20. H. Dang and K. Owczarzak, "Overview of the TAC 2008 Update Summarization Task," in Proceedings of Text Analysis Conference, 2008, pp. 1–16
Index Terms

Computer Science
Information Sciences

Keywords

Term Frequency Term Weight Inverse Sentence Frequency Noun And Verb Chunking Sentence Position Sentence Length Verb Featured Sentences Compression Ratio Retention Ratio Data Compression Features Hmm Tagger