Automated Multiple Related Documents Summarization via Jaccardís Coefficient

Huda Yasin; Mohsin Mohammad Yasin; Farah Mohammad Yasin

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 22 June 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Receiver Operating Characteristic for Variable Threshold and Sample Values using Energy Detection for Secondary user in Cognitive Environment

Feb

2017

Cell phone Operated Remote Control using DTMF

Apr

2018

Cryptanalysis of RSA with Small Prime Difference using Unravelled Linearization

January

2013

Spatial Clustering Simulation on Analysis of Spatial-Temporal Crime Hotspot for Predicting Crime activities

December

2011

Reseach Article

Automated Multiple Related Documents Summarization via Jaccardís Coefficient

by Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 13 - Number 3

Year of Publication: 2011

Authors: Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin

10.5120/1762-2415

Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin . Automated Multiple Related Documents Summarization via Jaccardís Coefficient. International Journal of Computer Applications. 13, 3 ( January 2011), 12-15. DOI=10.5120/1762-2415

@article{ 10.5120/1762-2415,

author = { Huda Yasin, Mohsin Mohammad Yasin, Farah Mohammad Yasin },

title = { Automated Multiple Related Documents Summarization via Jaccardís Coefficient },

journal = { International Journal of Computer Applications },

issue_date = { January 2011 },

volume = { 13 },

number = { 3 },

month = { January },

year = { 2011 },

issn = { 0975-8887 },

pages = { 12-15 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume13/number3/1762-2415/ },

doi = { 10.5120/1762-2415 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:01:47.250717+05:30

%A Huda Yasin

%A Mohsin Mohammad Yasin

%A Farah Mohammad Yasin

%T Automated Multiple Related Documents Summarization via Jaccardís Coefficient

%J International Journal of Computer Applications

%@ 0975-8887

%V 13

%N 3

%P 12-15

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Today, in the hasty advancement epoch of technology, allotting and gathering of information are imperative. Readers enthrall with an undersized edition of copious prolonged text documents. In this paper, we represent our approach which we used in our Automated Text Summarization System known as MDSS (Multiple Documents Summarization System). We elucidate a new fangled approach which is based on statistical (rather than semantic) factors. In contrast to single document summarization, the issues of compression, speediness, superfluous and passage opting are more decisive in multiple documents summarization. For sentence comparison, Jaccard‚Äôs coefficient is used to improve the worth and quality of the summarization. Resemblance exists between our algorithms and dynamic time warping. Our experimental domino effects indicate that it is useful and effectual to enhance the quality of multiple documents summarization via Jaccard‚Äôs coefficient. Our system MDSS is implemented in Java (jdk 1.6).

References

Doru Tanasa, Brigitte Trousse, "Advanced Data Preprocessing for Intersites Web Usage Mining," IEEE Intelligent Systems, vol. 19, no. 2, pp. 59-65, Mar./Apr. 2004
Margaret H. Dunham and S.Sridhar, 2006, Data Mining (Introductory and Advanced Topics). Pearson Education, chapter 1.
Luhn. H.P. ‚ÄúThe Automatic Creation of Literature Abstracts‚Äù. IBM Journal of Research and Development, Vol. 2, No. 2, pp. 159-165, April 1958.
Tsutomu HIRAO, Takahiro FUKUSIMA, Manabu OKUMURA, Chikashi NOBATA. ‚ÄúCorpus and Evaluation Measures for Multiple Documents Summarization with Multiple Sources‚Äù.
Jade Goldstein, Vibhu Mittal, Jaime Carbonell and Mark Kantrowitz., Multi-Document Summarization by Sentence Extraction.
E. Qwiener, J.O. Pederson, and A.S.Weigned, ‚ÄúA neural network approach to topic spotting‚Äù, in Proceedings of the fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR‚Äô95), 1995.
Y.Yang and C.G.Chutte, ‚ÄúAn example-based mapping method for text categorization and retrieval‚Äù, ACM Transaction on Information Systems (TOIS), 12(3):252-277, 1994.
Joachims, T., ‚ÄúText Categorization with Support Vector Machines: Learning with Many Relevant Features‚Äù, in European Conference on Machine Learning (ECML), 1998.
Mani, I., Automatic Text Summarization. John Benjamins Publishing Company, (2000-01).
Mani, I. and Bloedorn, E., Multi-document Summarization by Graph Search and Matching 1997.
Witold Pedrycz, Knowledge based clustering from data to information granules.
Michael J. A. Berry, Gordon S. Linoff, Data Mining Techniques (For marketing, sales, and CRM).
Rada Mihalcea and Paul Tarau, A Language Independent Algorithm for Single and Multiple Document Summarization, University of North Texas
Derong Liu, Yongcheng Wang, Chuanhan Liu, and Zhiqi Wang, Multiple Documents Summarization Based on Genetic Algorithm.
V. Finley Lacatusu, Steven J. Maiorano and Sanda M. Harabagiu, Multi-Document Summarization using Multiple-Sequence Alignment, Human Language Technology Research Institute, Department of Computer Science, University of Texas at Dallas
Huan Liu, Nitin Agarwal, Robert Grossman, 2009, Modeling and Data Mining in Blogosphere.
Stop Words List Available at: http://www.lextek.com/manuals/onix/stopwords1.html and http://www.lextek.com/manuals/onix/stopwords2.html

Index Terms

Computer Science

Information Sciences

Keywords

Multi-document summarization Jaccard‚Äôs coefficient sentence comparison text mining