CFP last date
22 April 2024
Reseach Article

Information Retrieval using Cosine and Jaccard Similarity Measures in Vector Space Model

by Abhishek Jain, Aman Jain, Nihal Chauhan, Vikrant Singh, Narina Thakur
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 164 - Number 6
Year of Publication: 2017
Authors: Abhishek Jain, Aman Jain, Nihal Chauhan, Vikrant Singh, Narina Thakur
10.5120/ijca2017913699

Abhishek Jain, Aman Jain, Nihal Chauhan, Vikrant Singh, Narina Thakur . Information Retrieval using Cosine and Jaccard Similarity Measures in Vector Space Model. International Journal of Computer Applications. 164, 6 ( Apr 2017), 28-30. DOI=10.5120/ijca2017913699

@article{ 10.5120/ijca2017913699,
author = { Abhishek Jain, Aman Jain, Nihal Chauhan, Vikrant Singh, Narina Thakur },
title = { Information Retrieval using Cosine and Jaccard Similarity Measures in Vector Space Model },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2017 },
volume = { 164 },
number = { 6 },
month = { Apr },
year = { 2017 },
issn = { 0975-8887 },
pages = { 28-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume164/number6/27489-2017913699/ },
doi = { 10.5120/ijca2017913699 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:10:35.733821+05:30
%A Abhishek Jain
%A Aman Jain
%A Nihal Chauhan
%A Vikrant Singh
%A Narina Thakur
%T Information Retrieval using Cosine and Jaccard Similarity Measures in Vector Space Model
%J International Journal of Computer Applications
%@ 0975-8887
%V 164
%N 6
%P 28-30
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the exponential growth of documents available to us on the web, the requirement for an effective technique to retrieve the most relevant document matching a given search query has become critical. The field of Information Retrieval deals with the problem of document similarity to retrieve desired information from a large amount of data. Various models and similarity measures have been proposed to determine the extent of similarity between two objects. The objective of this paper is to summarize the entire process, looking into some of the most well-known algorithms and approaches to match a query text against a set of indexed documents.

References
  1. “Roshdi, Akram, and Akram Roohparvar. "Review: Information Retrieval Techniques and Applications.”
  2. “Salton, Gerard, and Christopher Buckley. "Term-weighting approaches in automatic text retrieval." Information processing & management 24.5 (1988): 513-523.”
  3. “Le, Quoc V., and Tomas Mikolov. "Distributed Representations of Sentences and Documents." ICML. Vol. 14. 2014.”
  4. “Singh, Vaibhav Kant, and Vinay Kumar Singh. "VECTOR SPACE MODEL: AN INFORMATION RETRIEVAL SYSTEM." Int. J. Adv. Engg. Res. Studies/IV/II/Jan.-March 141 (2015): 143.”
  5. “Deshmukh, Ashwini, et al. "A Literature Survey On Latent Semantic Indexing." International Conference on Computing. 2012.
  6. “Ibrahim, O., and D. Landa-Silva. "Term frequency with average term occurrences for textual information retrieval." Soft Comput 20.8 (2016): 3045-3061.”
  7. “Paik, Jiaul H. "A novel TF-IDF weighting scheme for effective ranking." Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2013.”
  8. “Shirakawa, Masumi, Takahiro Hara, and Shojiro Nishio. "N-gram IDF: A Global Term Weighting Scheme Based on Information Distance." Proceedings of the 24th International Conference on World Wide Web. ACM, 2015.”
  9. ”Ghag, Kranti, and Ketan Shah. "SentiTFIDF–Sentiment Classification using Relative Term Frequency Inverse Document Frequency." Int. J. Adv. Comput. Sci. Appl. Sci. Inf. Organ (2014).”
  10. “Quercia, Daniele, et al. "Recommending social events from mobile phone location data." 2010 IEEE International Conference on Data Mining. IEEE, 2010.”
  11. “Nguyen, Hieu V., and Li Bai. "Cosine similarity metric learning for face verification." Asian Conference on Computer Vision. Springer Berlin Heidelberg, 2010.”
  12. “Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. "Glove: Global Vectors for Word Representation." EMNLP. Vol. 14. 2014.”
  13. “Steinbach, Michael, George Karypis, and Vipin Kumar. "A comparison of document clustering techniques." KDD workshop on text mining. Vol. 400. No. 1. 2000.”
  14. “Yin, Jie, et al. "Using social media to enhance emergency situation awareness." IEEE Intelligent Systems 27.6 (2012): 52-59.”
  15. “O'Connor, Brendan, Michel Krieger, and David Ahn. "TweetMotif: Exploratory Search and Topic Summarization for Twitter." ICWSM. 2010.”
  16. “Choi, Seung-Seok, Sung-Hyuk Cha, and Charles C. Tappert. "A survey of binary similarity and distance measures." Journal of Systemics, Cybernetics and Informatics 8.1 (2010): 43-48.”
  17. “S. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of Text Retrieval Conference (TREC), pages 109–126, 1994.”
  18. “Deerwester, Scott, et al. "Indexing by latent semantic analysis." Journal of the American society for information science 41.6 (1990): 391.”
  19. “Sahlgren, Magnus. "An introduction to random indexing." Methods and applications of semantic indexing workshop at the 7th international conference on terminology and knowledge engineering, TKE. Vol. 5. 2005.”
  20. “Robertson, Stephen, and Hugo Zaragoza. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc, 2009.”
Index Terms

Computer Science
Information Sciences

Keywords

Weighting Measures TF/IDF Cosine Similarity Measure Jaccard Similarity Measure Information Retrieval.