Call for Paper - May 2023 Edition
IJCA solicits original research papers for the May 2023 Edition. Last date of manuscript submission is April 20, 2023. Read More

Improving Semantic Similarity for Pairs of Short Biomedical Texts with Concept Definitions and Ontology Structure

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 99 - Number 15
Year of Publication: 2014
Olivia Sanchez Graillet

Olivia Sanchez Graillet. Article: Improving Semantic Similarity for Pairs of Short Biomedical Texts with Concept Definitions and Ontology Structure. International Journal of Computer Applications 99(15):1-7, August 2014. Full text available. BibTeX

	author = {Olivia Sanchez Graillet},
	title = {Article: Improving Semantic Similarity for Pairs of Short Biomedical Texts with Concept Definitions and Ontology Structure},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {99},
	number = {15},
	pages = {1-7},
	month = {August},
	note = {Full text available}


Finding semantic similarity between short biomedical texts, such as article abstracts or experiment descriptions, may provide important information for health researchers. This paper presents a method for calculating text similarity in the biomedical context. The method implements a pairwise concept semantic similarity measure that uses concept definitions and ontology structure. The respective results have demonstrated an improved performance in comparison with a previous version of the method using lexical-based measures as similarity function, as well as with other alternative tools for measuring text similarity.


  • H. Al-Mubaid and H. A. Nguyen. A cluster-based approach for semantic similarity in the biomedical domain. In Engineering in Medicine and Biology Society, 2006. EMBS '06. 28th Annual International Conference of the IEEE, pages 2713–17, 2006.
  • M. Batet, D. S´anchez, and A. Valls. An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics, 44(1):118–125, 2011.
  • J. E. Caviedes and J. J. Cimino. Towards the development of a conceptual distance metric for the umls. J. of Biomedical Informatics, 37(2):77–85, 2004.
  • J. Chen, R. Chau, and Ch-H. Yeh. Discovering parallel text from the world wide web. In Proceedings of the Second Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation, volume 32, pages 157–161. Australian Computer Society, Inc. , 2004.
  • T. Fawcett. Roc graphs: notes and practical considerations for data mining researchers (hpl-20034). Technical report, HP Laboratorie, 2003.
  • W. R. Hersh, C. Buckley, T. J. Leone, and D. H. Hickam. Ohsumed: An interactive retrieval evaluation and new large test collection for research. In Proceedings of the 17th Annual ACM SIGIR Conference, pages 192–201, 1994.
  • W. R. Hersh and D. H. Hickam. Use of a multi-application computer workstation in a clinical setting. In Bulletin of the Medical Library Association, volume 82, pages 382–389, 1994.
  • J. J. Jiang and D. W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics, pages 19–33, 1997.
  • T. K. Landauer, P. W. Foltz, and D. Laham. Introduction to latent semantic analysis. discourse. Discourse Processes, 25:259–284, 1998.
  • C. Leacock and M. Chodorow. Combining local context and WordNet similarity for word sense identification, pages 305– 332. In C. Fellbaum (Ed. ), MIT Press, 1998.
  • M. Lesk. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC '86, pages 24–26. ACM, 1986.
  • J. Lewis, S. Ossowski, J. Hicks, M. Errami, and H. R. Garner. Text similarity: An alternative way to search medline. Bioinformatics, 22(18):2298–304, 2006.
  • Y. Li, Z. A. Bandar, and D. McLean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. on Knowl. and Data Eng. , 15(4):871–882, 2003.
  • D. Lin. An information-theoretic definition of similarity. In Proceedings of the Fifteenth International Conference on Machine Learning, ICML '98, pages 296–304, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc.
  • P. W. Lord, R. D. Stevens, A. Brass, and C. A. Goble. Semantic similarity measures as tools for exploring the gene ontology. In Pac Symp Bio-comput Proc. , pages 601–612, 2003.
  • R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st national conference on Artificial intelligence - Volume 1, AAAI'06, pages 775–780. AAAI Press, 2006.
  • D. L. Olson and D. Delen. Advanced Data Mining Techniques. Springer, 2008.
  • S. Patwardhan and T. Pedersen. Using wordnet-based context vectors to estimate the semantic relatedness of concepts. In Proceedings of the EACL 2006 workshop, making sense of sense: Bringing computational linguistics and psycholinguistics together, pages 1–8, 2006.
  • T. Pedersen, S. V. Pakhomov, S. Patwardhan, and C. G. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3):288–299, 2007.
  • A. Pertsemlidis and H. R. Garner. Text comparison based on dynamic programming. IEEE Engineering in Medicine and Biology Magazine, 23(6):66–71, 2004.
  • R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1):17– 30, 1989.
  • Ph. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448–453, 1995.
  • G. Salton. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
  • O. Sanchez-Graillet. Semantic similarity measure for pairs of short biological texts. International Journal of Applied Information Systems, 4(5):1–5, 2012.
  • O. Sanchez-Graillet. Using concept definitions and ontology structure to measure semantic similarity in biomedicine. International Journal of Applied Information Systems, 4(5):1–5, 2014.
  • K. Sparck-Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11–21, 1972.
  • I. Spasic and S. Ananiadou. A flexible measure of contextual similarity for biomedical terms. In Pacific Biocomputing Symposium, pages 197–208, 2005.
  • P. D. Turney. Mining the web for synonyms: Pmi-ir versus lsa on toefl. In Proceedings of the Twelfth European COnference on Machine Learning ECML. 2001, pages 491–502, 2001.
  • G. Wade. SNOMED CT: The Clinical Data Standard. Overview and Application to eHRs, 2013.
  • Z. Wu and M. Palmer. Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, ACL '94, pages 133–138,Stroudsburg, PA, USA, 1994. Association for Computational Linguistics.