CFP last date
20 May 2024
Reseach Article

Automated Multiple Related Documents Summarization via Jaccardís Coefficient

by Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 13 - Number 3
Year of Publication: 2011
Authors: Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin
10.5120/1762-2415

Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin . Automated Multiple Related Documents Summarization via Jaccardís Coefficient. International Journal of Computer Applications. 13, 3 ( January 2011), 12-15. DOI=10.5120/1762-2415

@article{ 10.5120/1762-2415,
author = { Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin },
title = { Automated Multiple Related Documents Summarization via Jaccardís Coefficient },
journal = { International Journal of Computer Applications },
issue_date = { January 2011 },
volume = { 13 },
number = { 3 },
month = { January },
year = { 2011 },
issn = { 0975-8887 },
pages = { 12-15 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume13/number3/1762-2415/ },
doi = { 10.5120/1762-2415 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:01:47.250717+05:30
%A Huda Yasin
%A Mohsin Mohammad Yasin
%A Farah Mohammad Yasin
%T Automated Multiple Related Documents Summarization via Jaccardís Coefficient
%J International Journal of Computer Applications
%@ 0975-8887
%V 13
%N 3
%P 12-15
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Today, in the hasty advancement epoch of technology, allotting and gathering of information are imperative. Readers enthrall with an undersized edition of copious prolonged text documents. In this paper, we represent our approach which we used in our Automated Text Summarization System known as MDSS (Multiple Documents Summarization System). We elucidate a new fangled approach which is based on statistical (rather than semantic) factors. In contrast to single document summarization, the issues of compression, speediness, superfluous and passage opting are more decisive in multiple documents summarization. For sentence comparison, Jaccard’s coefficient is used to improve the worth and quality of the summarization. Resemblance exists between our algorithms and dynamic time warping. Our experimental domino effects indicate that it is useful and effectual to enhance the quality of multiple documents summarization via Jaccard’s coefficient. Our system MDSS is implemented in Java (jdk 1.6).

References
  1. Doru Tanasa, Brigitte Trousse, "Advanced Data Preprocessing for Intersites Web Usage Mining," IEEE Intelligent Systems, vol. 19, no. 2, pp. 59-65, Mar./Apr. 2004
  2. Margaret H. Dunham and S.Sridhar, 2006, Data Mining (Introductory and Advanced Topics). Pearson Education, chapter 1.
  3. Luhn. H.P. “The Automatic Creation of Literature Abstracts”. IBM Journal of Research and Development, Vol. 2, No. 2, pp. 159-165, April 1958.
  4. Tsutomu HIRAO, Takahiro FUKUSIMA, Manabu OKUMURA, Chikashi NOBATA. “Corpus and Evaluation Measures for Multiple Documents Summarization with Multiple Sources”.
  5. Jade Goldstein, Vibhu Mittal, Jaime Carbonell and Mark Kantrowitz., Multi-Document Summarization by Sentence Extraction.
  6. E. Qwiener, J.O. Pederson, and A.S.Weigned, “A neural network approach to topic spotting”, in Proceedings of the fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR’95), 1995.
  7. Y.Yang and C.G.Chutte, “An example-based mapping method for text categorization and retrieval”, ACM Transaction on Information Systems (TOIS), 12(3):252-277, 1994.
  8. Joachims, T., “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, in European Conference on Machine Learning (ECML), 1998.
  9. Mani, I., Automatic Text Summarization. John Benjamins Publishing Company, (2000-01).
  10. Mani, I. and Bloedorn, E., Multi-document Summarization by Graph Search and Matching 1997.
  11. Witold Pedrycz, Knowledge based clustering from data to information granules.
  12. Michael J. A. Berry, Gordon S. Linoff, Data Mining Techniques (For marketing, sales, and CRM).
  13. Rada Mihalcea and Paul Tarau, A Language Independent Algorithm for Single and Multiple Document Summarization, University of North Texas
  14. Derong Liu, Yongcheng Wang, Chuanhan Liu, and Zhiqi Wang, Multiple Documents Summarization Based on Genetic Algorithm.
  15. V. Finley Lacatusu, Steven J. Maiorano and Sanda M. Harabagiu, Multi-Document Summarization using Multiple-Sequence Alignment, Human Language Technology Research Institute, Department of Computer Science, University of Texas at Dallas
  16. Huan Liu, Nitin Agarwal, Robert Grossman, 2009, Modeling and Data Mining in Blogosphere.
  17. Stop Words List Available at: http://www.lextek.com/manuals/onix/stopwords1.html and http://www.lextek.com/manuals/onix/stopwords2.html
Index Terms

Computer Science
Information Sciences

Keywords

Multi-document summarization Jaccard’s coefficient sentence comparison text mining