CFP last date
20 May 2024
Reseach Article

A Hybrid Model for Paraphrase Detection Combines pros of Text Similarity with Deep Learning

by Mohamed I. El Desouki, Wael H. Gomaa, Hawaf Abdalhakim
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 178 - Number 20
Year of Publication: 2019
Authors: Mohamed I. El Desouki, Wael H. Gomaa, Hawaf Abdalhakim
10.5120/ijca2019919011

Mohamed I. El Desouki, Wael H. Gomaa, Hawaf Abdalhakim . A Hybrid Model for Paraphrase Detection Combines pros of Text Similarity with Deep Learning. International Journal of Computer Applications. 178, 20 ( Jun 2019), 18-23. DOI=10.5120/ijca2019919011

@article{ 10.5120/ijca2019919011,
author = { Mohamed I. El Desouki, Wael H. Gomaa, Hawaf Abdalhakim },
title = { A Hybrid Model for Paraphrase Detection Combines pros of Text Similarity with Deep Learning },
journal = { International Journal of Computer Applications },
issue_date = { Jun 2019 },
volume = { 178 },
number = { 20 },
month = { Jun },
year = { 2019 },
issn = { 0975-8887 },
pages = { 18-23 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume178/number20/30650-2019919011/ },
doi = { 10.5120/ijca2019919011 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:50:56.929303+05:30
%A Mohamed I. El Desouki
%A Wael H. Gomaa
%A Hawaf Abdalhakim
%T A Hybrid Model for Paraphrase Detection Combines pros of Text Similarity with Deep Learning
%J International Journal of Computer Applications
%@ 0975-8887
%V 178
%N 20
%P 18-23
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Paraphrase detection (PD) is a very essential and important task in Natural language processing. The goal of paraphrase detection is to check whether two statements written in natural language have the identical semantic or not. Its importance appears in many fields like plagiarism detection, question answering, document clustering and information retrieval, etc. This paper proposes a hybrid model that combines the text similarity approach with deep learning approach in order to improve paraphrase detection. This model verified results with Microsoft Research Paraphrase Corpus (MSPR) dataset, shows that accuracy measure is about 76.6% and F-measure is about 83.5%.

References
  1. Gomaa, W. H., & Fahmy, A. A. (2011). Tapping into the power of automatic scoring. In The Eleventh International Conference on Language Engineering, Egyptian Society of Language Engineering (ESOLEC).
  2. Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735– 1780, 1997.
  3. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436.
  4. Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Skip-thought vectors. In Advances in neural information processing systems (pp. 3294-3302).
  5. Mihalcea, R., Corley, C., & Strapparava, C. (2006, July). Corpus-based and knowledge-based measures of text semantic similarity. In AAAI (Vol. 6, pp. 775-780).
  6. Hassan, S. (2011). Measuring semantic relatedness using salient encyclopedic concepts. University of North Texas.
  7. Rus, V., McCarthy, P. M., Lintean, M. C., McNamara, D. S., & Graesser, A. C. (2008, May). Paraphrase Identification with Lexico-Syntactic Graph Subsumption. In FLAIRS conference(pp. 201-206).
  8. Islam, A., & Inkpen, D. (2009). Semantic similarity of short texts. Recent Advances in Natural Language Processing V, 309, 227-236.
  9. Milajevs, D., Kartsaklis, D., Sadrzadeh, M., & Purver, M. (2014). Evaluating neural word representations in tensor-based compositional settings. arXiv preprint arXiv:1408.6179.
  10. Fernando, S., & Stevenson, M. (2008, March). A semantic similarity approach to paraphrase detection. In Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics (pp. 45-52).
  11. Qiu, L., Kan, M. Y., & Chua, T. S. (2006, July). Paraphrase recognition via dissimilarity significance classification. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp. 18-26). Association for Computational Linguistics.
  12. Ul-Qayyum, Z., & Altaf, W. (2012). Paraphrase identification using semantic heuristic features. Research Journal of Applied Sciences, Engineering and Technology, 4(22), 4894-4904.
  13. Kozareva, Z., & Montoyo, A. (2006). Paraphrase identification on the basis of supervised machine learning techniques. In Advances in natural language processing (pp. 524-533). Springer, Berlin, Heidelberg.
  14. Finch, A., Hwang, Y. S., & Sumita, E. (2005). Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005).
  15. Das, D., & Smith, N. A. (2009, August). Paraphrase identification as probabilistic quasi-synchronous recognition. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1-Volume 1 (pp. 468-476). Association for Computational Linguistics.
  16. Wan, S., Dras, M., Dale, R., & Paris, C. (2006). Using dependency-based features to take the’para-farce’out of paraphrase. In Proceedings of the Australasian Language Technology Workshop 2006 (pp. 131-138).
  17. Madnani, N., Tetreault, J., & Chodorow, M. (2012, June). Re-examining machine translation metrics for paraphrase identification. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 182-190). Association for Computational Linguistics.
  18. Ji, Y., & Eisenstein, J. (2013). Discriminative improvements to distributional sentence similarity. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 891-896).
  19. Filice, S., Da San Martino, G., & Moschitti, A. (2015). Structural representations for learning relations between pairs of texts. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (Vol. 1, pp. 1003-1013).
  20. Socher, R., Huang, E. H., Pennin, J., Manning, C. D., & Ng, A. Y. (2011). Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in neural information processing systems (pp. 801-809).
  21. Blacoe, W., & Lapata, M. (2012, July). A comparison of vector-based representations for semantic composition. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (pp. 546-556). Association for Computational Linguistics.
  22. He, H., Gimpel, K., & Lin, J. (2015). Multi-perspective sentence similarity modeling with convolutional neural networks. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1576-1586).
  23. Cheng, J., & Kartsaklis, D. (2015). Syntax-aware multi-sense word embeddings for deep compositional models of meaning. arXiv preprint arXiv:1508.02354.
  24. Wang, Z., Mi, H., & Ittycheriah, A. (2016). Sentence similarity learning by lexical decomposition and composition. arXiv preprint arXiv:1602.07019.
  25. Gomaa, W. H., & Fahmy, A. A. (2013). A survey of text similarity approaches. International Journal of Computer Applications, 68(13), 13-18.
  26. Wael H. Gomaa and Aly A. Fahmy (2017). SimAll: A flexible tool for text similarity. The Seventeenth Conference On Language Engineering ESOLEC' 2017 17 (1), 122-127, Ain Shams University, Cairo, Egypt.
  27. Mohamed El I Desouki and Wael H Gomaa. Exploring the Recent Trends of Paraphrase Detection. International Journal of Computer Applications 182(46):1-5,March 2019
Index Terms

Computer Science
Information Sciences

Keywords

Paraphrase detection Deep Learning Skip thought vector Text similarity