CFP last date
22 April 2024
Reseach Article

Scanning of Thesis Script Similarity with Vector Space Model

by Sri Winiarti, Ulaya Ahdiani, Romakh Fitriani
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 175 - Number 32
Year of Publication: 2020
Authors: Sri Winiarti, Ulaya Ahdiani, Romakh Fitriani
10.5120/ijca2020920879

Sri Winiarti, Ulaya Ahdiani, Romakh Fitriani . Scanning of Thesis Script Similarity with Vector Space Model. International Journal of Computer Applications. 175, 32 ( Nov 2020), 38-46. DOI=10.5120/ijca2020920879

@article{ 10.5120/ijca2020920879,
author = { Sri Winiarti, Ulaya Ahdiani, Romakh Fitriani },
title = { Scanning of Thesis Script Similarity with Vector Space Model },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2020 },
volume = { 175 },
number = { 32 },
month = { Nov },
year = { 2020 },
issn = { 0975-8887 },
pages = { 38-46 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume175/number32/31659-2020920879/ },
doi = { 10.5120/ijca2020920879 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:40:07.292021+05:30
%A Sri Winiarti
%A Ulaya Ahdiani
%A Romakh Fitriani
%T Scanning of Thesis Script Similarity with Vector Space Model
%J International Journal of Computer Applications
%@ 0975-8887
%V 175
%N 32
%P 38-46
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The rapid growth of online textual data has increased the need for information retrieval (IR) methods that is time efficient. Text classification is the process of finding the category of a document based on its content. However, few discuss text classification using cascading texts. In general, text classification uses the Vector Space Model (VSM) proposed by Salton, Wong, and Yang (1975) as a model for document representation and queries. One of the limitations of VSM is the problem of space, because each document must be represented using all the words in the dictionary (i.e. vocabulary). With the convenience provided by search engines to assist users in searching for information online, the internet is the dominant data and information center. No exception for students, the level of internet use in finding references is very high. Statistics released by the Indonesian Internet Service Providers Association (APJII) in 2019 stated that Indonesian internet users had reached 171.17 million people or around 64.8%. Young people with an age range of 15-34 years dominate the number of users up to 49.52%. There is possibilities of similarities in publication, due to the large number of scientific publications that are published each year. The highest level of similarity in the thesis text is in the title and theoretical study. In searching for references for theoretical studies, students tend to plagiarize on a scientific work by copying part or even the entire content without mentioning the original source of the scientific work. Therefore, this research aims to create a document similarity detection system using the Vector Space Models (SVM) method. The data sets used to detect the similarities were 443 undergraduate thesis titles and 442 studies on the theory of thesis texts. From the accuracy test carried out on 132 queries from 321 thesis texts, it was obtained a mean average precision of 0.996.

References
  1. Cheryl Aasheim, C. and Koehler, G.J 2005. Scanning World Wide Web documents with the vector space model. Journal Decision Support Systems 42 (2006) 690– 699, available online at ScienceDirect, Elsevier.com.
  2. Chao Ke, Zhigang Jiang, Hua Zhang, Yan Wang, Shuo Zhu.2020. An intelligent design for remanufacturing method based on vector space model and case-based reasoning. Journal of Cleaner Production 277 (2020) 123269. Available on Elsavier.com.
  3. Al-Anzi,F.S., and AbuZeina, D. 2018. Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach. Journal Information Processing and Management, No 54 2018, pp 105-115. Available on Elsavier.com.
  4. Zhan Su, Xiliang Zheng, Jun Ai, Yuming Shen, Xuanxiong Zhang.2020, Link prediction in recommender systems based on vector similarity, Physica A 560 (2020) 125154, Available on Elsavier.com.
  5. Sarkar, D., Jana, P., 2019. Analyzing User Activities Using Vector Space Model in Online Social Networks arXiv preprint arXiv:1910.05691.
  6. Jiang, R., Kim, S., Banchs, R.E., Li, H., 2015. Towards Improving the Performance of Vector Space Model for Chinese Frequently Asked Question Answering, 2015 International Conference on Asian Language Processing (IALP). IEEE pp. 136e139.
  7. Auster, E and Choo, C.W. 1994. How Senior Managers Acquire and Use Information In Environmental Scanning Journal of Information Processing & Management, Vol. 30, No. 5, pp. 607-618, 1994.
  8. F.J. Aguilar, Scanning the Business Environment, Macmillan, New York, 1967.
  9. C.W. Choo, Information Management for the Intelligent Organization: the Art of Scanning the Environment, Information Today, Inc., Medford, 2002.
  10. Fazayeli,H., Syed-Mohamad, S.S., Md Akhir, , N.S. 2019. Towards Auto-labelling Issue Reports for Pull-Based Software Development using Text Mining Approach, Proceeding: The Fifth Information Systems International Conference 2019, Elsavier.
  11. Rasjid, S.E. and Setiawan, R. 2017. Performance Comparison and Optimization of Text Document Classification using k-NNand Naïve Bayes Classification Techniques, Proceeding 2nd International Conference on Computer Science and Computational Intelligence 2017, ICCSCI 2017, 13- 14 October 2017, Bali. Available on Elsevier.
  12. Sevindik, T. and Cömert, Z. 2010. Using algorithms for evaluation in web based distance education, Procedia Social and Behavioral Sciences 9 (2010) 1777–1780. Available on Elsevier.
  13. H. Robothama, J. Castillo, P. Boscha, J. Perez-Kallens. 2011. A comparison of multi-class support vector machine and classification tree methods for hydroacoustic classification of fish-schools in Chile, Journal of Fisheries Research 111 (2011) pp. 170-176. Available on Elsevier.
  14. Aasheim, C, and Koehler, Gary J.2005. Scanning World Wide Web documents with the vector space model. Journal of Decision Support Systems 42 (2006) 690– 699. Available on Elsevier.
  15. G. Salton, Automatic Information Organization and Retrieval, McGraw-Hill Book Company, New York, 1968.
  16. G. Salton, A. Wong and C.S. Yang, A vector space model for automatic indexing, Communications of the ACM 18 (1975), no. 11, 613 – 620.
  17. R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining an Introduction, Cambridge University Press, 2014.
  18. F. Trevor Rogers. 2020. Patent text similarity and cross-cultural venture-backed innovation, Journal of Behavioral and Experimental Finance 26 (2020) 100319. Availavle on Elsevier.
Index Terms

Computer Science
Information Sciences

Keywords

Similarity Vector Space Models Thesis script Title Theoretical Framework