CFP last date
20 May 2024
Reseach Article

Categorization of ‘Holy Quran-Tafseer’ using K-Nearest Neighbor Algorithm

by Geehan Sabah Hassan, Siti Khaotijah Mohammad, Faris Mahdi Alwan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 129 - Number 12
Year of Publication: 2015
Authors: Geehan Sabah Hassan, Siti Khaotijah Mohammad, Faris Mahdi Alwan
10.5120/ijca2015906909

Geehan Sabah Hassan, Siti Khaotijah Mohammad, Faris Mahdi Alwan . Categorization of ‘Holy Quran-Tafseer’ using K-Nearest Neighbor Algorithm. International Journal of Computer Applications. 129, 12 ( November 2015), 1-6. DOI=10.5120/ijca2015906909

@article{ 10.5120/ijca2015906909,
author = { Geehan Sabah Hassan, Siti Khaotijah Mohammad, Faris Mahdi Alwan },
title = { Categorization of ‘Holy Quran-Tafseer’ using K-Nearest Neighbor Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { November 2015 },
volume = { 129 },
number = { 12 },
month = { November },
year = { 2015 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume129/number12/23122-2015906909/ },
doi = { 10.5120/ijca2015906909 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:23:11.341322+05:30
%A Geehan Sabah Hassan
%A Siti Khaotijah Mohammad
%A Faris Mahdi Alwan
%T Categorization of ‘Holy Quran-Tafseer’ using K-Nearest Neighbor Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 129
%N 12
%P 1-6
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text categorization, TC, is a process of labeling natural language texts with one or several categories from a predefined set. TC is a supervised learning where the set of categories and examples of documents belonging to those categories is given. The task of automatic TC is assigned an electronic document to several categories, based on a training set of labeled documents. The research objectives are, to formulate a K-Nearest Neighbor (KNN) algorithm for the automatic and suitable classification of any Holy Quran Tafseer segment; to identify relevant categories of Holy Quran Tafseer in the form of number classes; and to retrieve, Tafseer of verses of the Holy Quran in Malay language. Hence, this research aims to automatically categorize the Tafseer of verses of Holy Quran using the KNN algorithm as a technique to solve text categorization. This research has been designed to classify different verses in the Holy Quran. The first phase is to pre-process the Arabic text and then change the word in Arabic to Malay word. After that, categorize classes based on the cosine similarity between a test document and specific training documents. The majority of the same kind of nearest neighbors decides the category of the test sample and calculates precision and recall for a collection of documents. The result shows the outperform of TC using the KNN algorithm is one of the best algorithm for categorization Tafseer of Holy Quran. Furthermore, this study contributes in building a classifier to Tafseer Al-Quran in Malay language.

References
  1. Charu C Aggarwal and ChengXiang Zhai. Mining text data. Springer Science & Business Media, 2012. Fig. 8. Describe the recall, precision, fallout, and error rate over the 7 categories.
  2. Hamood Alshalabi, Sabrina Tiun, Nazlia Omar, and Mohammed Albared. Experiments on the use of feature selection and machine learning methods in automatic malay text categorization. Procedia Technology, 11:748–754, 2013.
  3. Baharum Baharudin, Lam Hong Lee, and Khairullah Khan. A review of machine learning algorithms for text-documents classification. Journal of advances in information technology, 1(1):4–20, 2010.
  4. P Bhargavi and S Jyothi. Applying naive bayes data mining technique for classification of agricultural land soils. International journal of computer science and network security, 9(8):117–122, 2009.
  5. Nitin Bhatia. Survey of nearest neighbor techniques. arXiv preprint arXiv:1007.0085, 2010.
  6. Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
  7. Rehab M Duwairi. Arabic text categorization. Int. Arab J. Inf. Technol., 4(2):125–132, 2007.
  8. Rehab M Duwairi and Rania Al-Zubaidi. A hierarchical k-nn classifier for textual data. Int. Arab J. Inf. Technol., 8(3):251– 259, 2011.
  9. Caspar J Fall and Karim Benzineb. Literature survey: Issues to be considered in the automatic classification of patents. World Intellectual Property Organization, 29, 2002.
  10. Liangxiao Jiang, Zhihua Cai, Dianhong Wang, and Siwei Jiang. Survey of improving k-nearest-neighbor for classification. In fskd, pages 679–683. IEEE, 2007.
  11. Shengyi Jiang, Guansong Pang, Meiling Wu, and Limin Kuang. An improved k-nearest-neighbor algorithm for text categorization. Expert Systems with Applications, 39(1):1503–1509, 2012.
  12. Asha Gowda Karegowda, MA Jayaram, and AS Manjunath. Cascading k-means clustering and k-nearest neighbor classifier for categorization of diabetic patients. International Journal of Engineering and Advanced Technonlogy, 1:147–151, 2012.
  13. Fang Lu and Qingyuan Bai. A refined weighted k-nearest neighbors algorithm for text categorization. In Intelligent Systems and Knowledge Engineering (ISKE), 2010 International Conference on, pages 326–330. IEEE, 2010.
  14. Christopher D Manning, Prabhakar Raghavan, Hinrich Sch¨utze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.
  15. Asha Rajkumar and G Sophia Reena. Diagnosis of heart disease using datamining algorithm. Global journal of computer science and technology, 10(10):38–43, 2010.
  16. Dr Sadiq, T Ahmed, and Sura Mahmood Abdullah. Hybrid intelligent techniques for text categorization. International Journal of Advanced Computer Science and Information Technology (IJACSIT) Vol, 2:23–40, 2014.
  17. Milan Sonka, Vaclav Hlavac, and Roger Boyle. Image processing, analysis, and machine vision. Cengage Learning, 2014.
Index Terms

Computer Science
Information Sciences

Keywords

Text categorization K-Nearest Neighbor algorithm