Call for Paper - March 2022 Edition
IJCA solicits original research papers for the March 2022 Edition. Last date of manuscript submission is February 22, 2022. Read More

Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold

International Journal of Computer Applications
© 2011 by IJCA Journal
Number 1 - Article 1
Year of Publication: 2011
Gourav Bathla
Rajni Jindal

Gourav Bathla and Rajni Jindal. Article: Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold. International Journal of Computer Applications 33(5):9-13, November 2011. Full text available. BibTeX

	author = {Gourav Bathla and Rajni Jindal},
	title = {Article: Similarity Measures of Research Papers and Patents using Adaptive and Parameter Free Threshold},
	journal = {International Journal of Computer Applications},
	year = {2011},
	volume = {33},
	number = {5},
	pages = {9-13},
	month = {November},
	note = {Full text available}


Patents and Research papers are published in various fields. These are stored in various conferences and journals database. If a user (researcher or any general user) want to search for any patent or research paper in any particular field, then there is lack of search criteria available for this. In this paper, we have used nearest neighbor algorithm with cosine similarity to categorize patents and research papers. In this paper, experimental results show that if a user want to search for the patent or research paper in any particular field or category, then user would get better results. The advantage of the approach presented in this paper is that the search area becomes very small and so waiting time of user to get answer of query reduces to a large extent. To take decision about category of particular research paper or patent, there have been a lot of research work but categorizing was not that much accurate. In this paper, we have calculated threshold based on the similarity of terms between query and research paper or patent. This proposed calculation of threshold value is not based on numerical values. So, this novel approach of threshold calculation categorize more accurately than previous research work.


  • Juan Ramos, Department of Computer Science, ICML 2005.Using TF-IDF to determine Word Relevance in Document Queries.
  • Peter D. Turney, Patric Pantel, Journal of Artificial Intelligence Research, 141-188, 2010. From frequency to Meaning: Vector Space Models of Semantics.
  • Christian Platzer, Schahram Dustdar ECOWS, IEEE 2005. A Vector Space Search Engine for Web Services.
  • Stephan Robertson. Journal of Documentation, Volume 60, Number 5, pp. 503-520,2004.Understanding Inverse Document Frequency: On theoretical arguments for IDF, Microsoft Research.
  • Sergey Brin, Lawrence Page. CNISDNS, Volume 30, Issue 1-7, pp.101-117, ACM 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine.
  • S.Suseela. Periyar Maniammai University 2009. Document Clustering Based on Term Frequency and Inverse Document Frequency.
  • Gang Qian, Shamik Sural, Yuelong Gu, Sakti Pramanik. SAC, pp.1232-1237, ACM 2004. Similarity between Euclidean and Cosine angle distance for nearest neighbor queries.
  • T.W.Fox. IEEE 2005. Document Vector Compression and Its Application in Document Clustering.
  • John Zakos, Brijesh Verma. ICDAR, pp.909-913, IEEE 2005.A Novel Context Matching Based Technique for Web Document Retrieval
  • Yun-lei Cai, Duo Ji, Dong-feng Cai. NTCIR-8, 2010. A KNN Research Paper Classification Method Based on Shared Nearest Neighbor.
  • Isa, D., Lee, L. H., Kallimani, V. P., and Rajkumar, R. IEEE Transactions on Knowledge and Data Engineering, Vol. 20, pp. 23-31. Text document preprocessing with the Bayes formula for classification using the support vector machine.
  • Songbo, T., Cheng, X., Ghanem, M. M., Wnag, B., and Xu, H. Proceedings of Fourteenth ACM International Conference on Information and Knowledge Management, pp 469 – 476, 2005. A novel refinement approach for text categorization.
  • Lan, M., Tan, C. L., Su. J., and Lu, Y. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 31 (4), pp. 721 – 735, 2009. Supervised and Traditional Term weighting methods for Automatic Text Categorization.
  • Juan Zhang, Yi Nui, Huabei Nie. International Conference on Computational Intelligence and Security 2009. Web Document Classification Based on Fuzzy k-NN Algorithm.
  • Alok Ranjan, Eatesh Kandpal, Harish Verma, Joydip Dhar. IJCSIS Vol.7 ,No. 2, pp. 257-261, 2010. An Analytical Approach to Document Clustering Based on Internal Criterion Function.