A Feature Terms based Method for Improving Text Summarization with Supervised POS Tagging

Suneetha Manne; S. Sameen Fatima

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

Explainable Hybrid Deep Learning for Automated Diagnosis of Canine Mammary Tumors

Elham Shawky Salama Heba Askr Ashraf Darwish Aboul Ella Hassanien

Random Articles

Reseach Article

A Feature Terms based Method for Improving Text Summarization with Supervised POS Tagging

by Suneetha Manne, S. Sameen Fatima

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 47 - Number 23

Year of Publication: 2012

Authors: Suneetha Manne, S. Sameen Fatima

10.5120/7494-0541

Suneetha Manne, S. Sameen Fatima . A Feature Terms based Method for Improving Text Summarization with Supervised POS Tagging. International Journal of Computer Applications. 47, 23 ( June 2012), 7-14. DOI=10.5120/7494-0541

@article{ 10.5120/7494-0541,

author = { Suneetha Manne, S. Sameen Fatima },

title = { A Feature Terms based Method for Improving Text Summarization with Supervised POS Tagging },

journal = { International Journal of Computer Applications },

issue_date = { June 2012 },

volume = { 47 },

number = { 23 },

month = { June },

year = { 2012 },

issn = { 0975-8887 },

pages = { 7-14 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume47/number23/7494-0541/ },

doi = { 10.5120/7494-0541 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:42:36.219906+05:30

%A Suneetha Manne

%A S. Sameen Fatima

%T A Feature Terms based Method for Improving Text Summarization with Supervised POS Tagging

%J International Journal of Computer Applications

%@ 0975-8887

%V 47

%N 23

%P 7-14

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Text summarization is the process of distilling the most important information from a source to produce an abridged version for a particular user and task. When this is done by means of a computer, i. e. automatically, it calls as Automatic Text Summarization. Summarization can be classified into two approaches: extraction and abstraction. Extraction based summaries are produced by concatenating several sentences taken exactly as they appear in the texts being summarized. Abstraction based summaries are written to convey the main information in the input and may reuse phrases or clauses from it. This paper focuses on extraction approach. The goal of text summarization based on extraction approach is sentences selection. One of the methods to obtain the sentences is to assign some feature terms of sentences for the summary called ranking sentences and then select the best ones. The first step in summarization by extraction is the identification of important features. In our approach 1000 computer science related research papers are used as test documents. Each document is prepared by preprocessing process: sentence segmentation, tokenization, stop word removal, case folding, lemmatization, and stemming. Then, using important features, sentence filtering features, data compression features and finally calculating score for each sentence. The proposed text summarization is based on HMM tagger to improve the quality of the summary. Here, comparing our results with the existing summarizers which are Copernicus summarizer, Great summarizer and Microsoft Word 2007 summarizers etc. The proposed system is also tested with four types' similarities: Cosine, Jaccard, Jaro-winkler and Sorenson similarities. The results show that the best quality for the summaries was obtained by feature terms method.

References

D. R. Radev and W. Fan, "Automatic summarization of search engine hit lists", Proceedings of the ACL-2000 workshop on recent advances in natural language processing and information retrieval, Hong Kong, 2000, pp. 99-109
ISC "ISC Internet Domain Survey", Available at: http://ftp. isc. org/www/survey/reports/current/
Sparck-Jones, K. "Automatic Summarizing: Factors and Directions" in Mani, I. And Maybury, M. , editors, Advances in Automatic Text Summarization, pp. 1-12. MIT Press, 1999
C. H. Chang, M. Kayed, M. R. Girgis, and K. Shaalan, "A survey of web information extraction systems," IEEE Transactions on Knowledge and Data Engineering, Vol. 18, 2006, pp. 1411-1428
Vishal Gupta and Gurpreet Singh Lehal, "A Survey of Text Summarization Extractive Techniques", Journal of Emerging Technologies In Web Intelligence, Vol. 2, No. 3, August 2010.
Luhn H. P, "The Automatic Creation of Literature Abstracts", IBM Journal April 1958 pp. 159–165.
Baxendale, P. (1958), 'Machine-made Index for Technical Literature –An Experiment', IBM Journal of Research Development, Vol. 2, No. 4, pp. 354-361.
Edmundson H. P, " New Methods in Automatic Extracting", Journal of the Association for Computing Machinery, Vol 16, No 2, April 1969, PP. 264-285.
J. J. Pollock and A. Zamora , "Automatic Abstracting Research at Chemical Abstracts Service", Journal of Chemical Information and Computer Sciences, 15(4), 226-232(1975).
Lin, C. -Y. and E. Hovy (1997). Identifying topics by position. In Proceedings of the Applied Natural Language Processing Conference (ANLP-97), Washington, DC, pp. 283-290.
Kathleen R. McKeown, "Discourse Strategies for Generating Natural Language Text", Department of Computer Science, Columbia University, New York, 1982
Brandow, R. , Mitze, K. , Rau, L. F. Automatic condensation of electronic publications by sentence selection. Information Processing anagement,31(5):675-685, 1995.
Barzilay R. , Elhadad M. , Boguraev & Kennedy M. , Using Lexical Chains for Text Summarization, Workshop on Intelligent Scalable Text Summarization, Ben Gurion University of the Negev, Be'er Sheva Israel, 1997.
B K Boguraev, C Kennedy, R Bellamy, "Dynamic presentation of phrasally-based document abstractions. " 32nd International Conference on System Sciences, 1999.
Marcu, D. 1999. The automatic construction of large-scale corpora for summarization research. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval, University of California, Berkeley, August.
Turney. 1999. Learning to extract keyphrases from text. Teical chnReport ERB-1057. (NRC#41622), National Research Council, Institute for Information Technology.
Jing, Hongyan and Kathleen McKeown. 2000. Cut and paste based text summarization. In 1st Conference of the North American Chapter of the Association for Computational Linguistics
Radev, R. , Blair-goldensohn, S, Zhang, Z. Experiments in Single and Multi-Docuemtn Summarization using MEAD. In First Document Understanding Conference, New Orleans, LA, 2001.
L. Bahl and R. L. Mercer, "Part-Of-Speech assignment by a statistical decision algorithm", IEEE International Symposium on Information Theory, pages: 88 - 89, 1976.
H. Dang and K. Owczarzak, "Overview of the TAC 2008 Update Summarization Task," in Proceedings of Text Analysis Conference, 2008, pp. 1–16

Index Terms

Computer Science

Information Sciences

Keywords

Term Frequency Term Weight Inverse Sentence Frequency Noun And Verb Chunking Sentence Position Sentence Length Verb Featured Sentences Compression Ratio Retention Ratio Data Compression Features Hmm Tagger