CFP last date
20 May 2024
Reseach Article

Context Score based Term Weighting Model for Text Summarization

by Pratik Kamble, S. C. Dharamadhikari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 98 - Number 12
Year of Publication: 2014
Authors: Pratik Kamble, S. C. Dharamadhikari
10.5120/17238-7572

Pratik Kamble, S. C. Dharamadhikari . Context Score based Term Weighting Model for Text Summarization. International Journal of Computer Applications. 98, 12 ( July 2014), 41-46. DOI=10.5120/17238-7572

@article{ 10.5120/17238-7572,
author = { Pratik Kamble, S. C. Dharamadhikari },
title = { Context Score based Term Weighting Model for Text Summarization },
journal = { International Journal of Computer Applications },
issue_date = { July 2014 },
volume = { 98 },
number = { 12 },
month = { July },
year = { 2014 },
issn = { 0975-8887 },
pages = { 41-46 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume98/number12/17238-7572/ },
doi = { 10.5120/17238-7572 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:26:03.539326+05:30
%A Pratik Kamble
%A S. C. Dharamadhikari
%T Context Score based Term Weighting Model for Text Summarization
%J International Journal of Computer Applications
%@ 0975-8887
%V 98
%N 12
%P 41-46
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Everybody is looking for relevant information briefly, which will cover information with small content. Summarization is the best for this. Current text summarization techniques do not consider the context i. e. background situation in that document. In this paper we are going to present the SentenceRank algorithm which will calculate the weight of the sentence based on the context score. We are going to make effective use of E-VSM : Enhance - Vector Space Model for bigram frequency count in whole corpus, where for each bigram we are going calculate the context score based on Bernoulli's model of randomness [1] [2]. Calculated bigrams context score is used in sentenceRank algorithm to calculate the context sensitive indexing weight of each sentence in a document. To reduce the redundancy in the sentences of summary, Cosine similarity measure is used to remove redundant sentence.

References
  1. P. Goyal, L. Behera and T. M. McGinnity "A Context-Based Word Indexing Model for Document Summarization," IEEE Trans. Knowl. Data Eng. , VOL. 25, NO. 8 pp 1693-1705, AUGUST 2013.
  2. G. Amati and C. J. Van Rijsbergen, "Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness," ACM Trans. Information Systems, vol. 20, pp. 357-389, http://doi. acm. org/10. 1145/582415. 582416, Oct. 2002.
  3. H. P. Luhn, "The Automatic Creation of Literature Abstracts," IBM J. Research and Development, vol. 2, pp. 159-165, http://dx. doi. org/ 10. 1147/rd. 22. 0159, Apr. 1958.
  4. N. Polettini, " The Vector Space Model in Information Retrieval- Term Weighting Problem" Dept. Inf. comm. Tech. , Univ. Trento, Italy, 2004.
  5. A. Nenkova et al. ,"A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization," Proc. 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 573 - 580, 2006
  6. N. Stokes et al. "Broadcast news gisting using lexical cohesion analysis," Conf. Information Retrieval, Volume 2997, pp 209-222, 2004.
  7. L. Li, "Enhancing diversity, coverage and balance for summarization through structure learning," Proc. 18th international conference on World wide web, pp. 71-80, April 20-24, 2009, Spain.
  8. X. Cai and W. Li "Mutually Reinforced Manifold-Ranking Based Relevance Propagation Model for Query-Focused Multi-Document Summarization". IEEE Transactions On Audio, Speech, And Language Processing, Vol. 20, No. 5, July 2012.
  9. X. J. Wan, J. W. Yang, and J. G. Xiao, "Manifold-ranking based topic focused multi-document summarization," Proc. 18th IJCAI Conf. , 2007, pp. 2903–2908.
  10. V. Murthy. G et al. ," A comparative study on term weighting methods for automated telugu text categorization with effective classifiers," J. Data mining and knowledge management process, vol. 3, pp. 95-105, 2013.
  11. Y. Ko & J. Park & J. Seo , "Improving text categorization using the importance of sentences," J. Information Processing and Management, v. 40 n. 1, p. 65-79, January 2004 [doi>10. 1016/S0306-4573(02)00056-0]
  12. Y. yan, J. O. Pederson "Comparitive Study on feature selection in Text categorization," Proc. Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, July 8-12, 1997, ISBN 1-55860-486-3
  13. Y. Liu, H. T. Loh & A. Sun , "Imbalanced text classification: A term weighting approach," J. Expert Systems with Applications, Volume 36, Issue 1, January 2009, pp 690–701
  14. A. Bhakkad, S C Dharamadhikari & Parag Kulkarni, "Efficient Approach to find Bigram Frequency in Text Document using E-VSM," J. Computer Applications , 68(19):9-11, April 2013. doi . 10. 5120/11686-7356.
  15. J. Hintikka, "On Semantic Information," Physics, Logic, and History, pp. 147-172, Springer, 1970.
Index Terms

Computer Science
Information Sciences

Keywords

Context Score E-VSM SentenceRank