Effect of Pronoun Resolution on Document Similarity

Atul Kumar; Sudip Sanyal

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

AI-Assisted Observability in Distributed Microservice Architectures

Kyrylo Sotnykov

Random Articles

An Evaluation of Network Topologies for Enhance Networking

Jun

2023

Semantic Web Application in Learning Resource Ontology Repository

April

2016

FRANSAC: Fast RANdom Sample Consensus for 3D Plane Segmentation

Jun

2017

Recommender Systems for Software Requirements Negotiation and Prioritization

May

2015

Reseach Article

Effect of Pronoun Resolution on Document Similarity

by Atul Kumar, Sudip Sanyal

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 1 - Number 16

Year of Publication: 2010

Authors: Atul Kumar, Sudip Sanyal

10.5120/341-519

Atul Kumar, Sudip Sanyal . Effect of Pronoun Resolution on Document Similarity. International Journal of Computer Applications. 1, 16 ( February 2010), 60-64. DOI=10.5120/341-519

@article{ 10.5120/341-519,

author = { Atul Kumar, Sudip Sanyal },

title = { Effect of Pronoun Resolution on Document Similarity },

journal = { International Journal of Computer Applications },

issue_date = { February 2010 },

volume = { 1 },

number = { 16 },

month = { February },

year = { 2010 },

issn = { 0975-8887 },

pages = { 60-64 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume1/number16/341-519/ },

doi = { 10.5120/341-519 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T19:42:43.577342+05:30

%A Atul Kumar

%A Sudip Sanyal

%T Effect of Pronoun Resolution on Document Similarity

%J International Journal of Computer Applications

%@ 0975-8887

%V 1

%N 16

%P 60-64

%D 2010

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper presents a novel effect of Pronoun Resolution on measurement of document similarity. In this paper we have studied the effect of pronoun resolution within the framework of the Vector Space Model and Probabilistic Latent Semantic Analysis. For this purpose we have developed a Benchmark Corpus consisting of documents whose similarity scores have been given by human beings. We measured the inter-document similarity on these documents using VSM and PLSA. We then performed pronoun resolution on these documents and again calculated the similarity using both methods. Next, the correlation coefficient of the scores was taken with those of the human generated scores. The correlation coefficients clearly demonstrated substantial and consistent improvements of the similarity score after pronoun resolution.

References

Lee, D L; Huei Chuang; Seamons, K (1997) Document ranking and the Vector Space model, Software IEEE Volume 14, Issue 2 Pages 67-75, Mar/Apr (1997).
Baeza –Yates, R and Riberio-Neto, B (1999) Modern Information Retrieval”, Addison Wesley Longman.
Salton, G; Wong, A and Yang, C S (1975) A Vector Space Model for Automatic Indexing, Communications of the ACM, vol. 18, nr. 11, pages 613 – 620.
Salton, G and Lesk, M (1971)Computer evaluation of indexing and text processing”, Prentice Hall, Ing. Englewood Cliffs, New Jersey. 143–180.
Deerweater, S; Dumais S T; Furnas, G W; Landuar, T K and Harshman, R A (1990) Indexing by Latent Semantic Analysis, Journal of the American Society for Information science,41(6).391-407.
Landauer, T K; Foltz P W and Laham D (1998)An Introduction to latent semantic analysis, Discourse Processes, vol. 25, pp. 259-284.
Thomas Hofmann (1999) Probabilistic Latent Semantic Indexing, Annual ACM Conference on Research and Development in Information Retrieval, Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, United States, pp 50 – 57
Thomas Hofmann (1999) Probabilistic Latent Semantic Analysis”, Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence.
Tuomo Kakkonen, Niko Myller, Jari Timonen and Erkki Sutinen (2005)Automatic Essay Grading with Probabilistic Latent semantic Analysis, Proceedings of the 2nd Workshop on Building Educational Applications Using NLP, pages 29-36, Ann Arbor, June (2005)
Dempster P; Larid N M and Rubin D B (1977) Maximum likelihood from incomplete data via the EM algorithm”, Journal of the Royal Statistical Society, 39 1-38.
University of Birmingham, School of computer science http://www.cs.bham.ac.uk/%7Eaxk/ML_PLSA.ppt
Pincombe, B M (2004)Comparison of human and latent semantic analysis (LSA) judgments of pairwise document similarities for a news corpus”, Defence Science and Technology Organisation Research Report DSTO–RR–0278
Girolami and Kaban A ,(2003)On an Equivalence between PLSI and LDA”, Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 433-434, Toronto, Canada ACM Press.
Turney P (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the Twelfth European Conference on Machine Learning.
Leacock C and Chodorow(1998) Combining local context and Word Net sense similarity for word sense identification,In WordNet an Electronic Lexical Database. The MIT Press.
Wu Z and Palmer M (1994)Verb semantics and lexical selection, Proceedings of the Annual Meeting of the Association for Computational Linguistics.
Rocchio J(1971)“Relevance feedback in information retrieval, Prentice Hall, Ing. Englewood Cliffs, New Jersey.
Mihalcea R, Corley C and Strapparava C(2006) Corpus-based and Knowledge-based Measures of Text Semantic Similarity, AAAI’06, pp 775-780.
Hammouda K M, Kamel M S (2004)Document similarity using a Phrase Indexing Graph Model, Knowledge and Information Systems Springer –Verlag London 6:710-727(2004)
Xu R, Wunsch II D (2005) Survey of clustering algorithm. IEEE Trans Neural Netw 16(3):645-678.
Vivekanandan K and Suguna J(2008)Inferring Document Similarity using the Fuzzy measure, Medwell Journals - Asian Journal of Information Technology 7 (1):1-5.
Wan X and Peng Y(2005)The earth mover's distance as a semantic measure for document similarity, Proceedings of the 14th ACM international Conference on Information and Knowledge Management Bremen, Germany, October 31 - November 05, CIKM '05. ACM Press, New York .

Index Terms

Computer Science

Information Sciences

Keywords

Document Similarity Pronoun Resolution Information Retrieval Statistical Algorithm