CFP last date
20 June 2024
Reseach Article

The Classification of Persian Texts with Statistical Approach and Extracting Keywords and Admissible Dataset

by Ehsan Mohtashami, Mehrnoosh Bazrafkan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 101 - Number 5
Year of Publication: 2014
Authors: Ehsan Mohtashami, Mehrnoosh Bazrafkan
10.5120/17683-8541

Ehsan Mohtashami, Mehrnoosh Bazrafkan . The Classification of Persian Texts with Statistical Approach and Extracting Keywords and Admissible Dataset. International Journal of Computer Applications. 101, 5 ( September 2014), 18-20. DOI=10.5120/17683-8541

@article{ 10.5120/17683-8541,
author = { Ehsan Mohtashami, Mehrnoosh Bazrafkan },
title = { The Classification of Persian Texts with Statistical Approach and Extracting Keywords and Admissible Dataset },
journal = { International Journal of Computer Applications },
issue_date = { September 2014 },
volume = { 101 },
number = { 5 },
month = { September },
year = { 2014 },
issn = { 0975-8887 },
pages = { 18-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume101/number5/17683-8541/ },
doi = { 10.5120/17683-8541 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:30:54.181631+05:30
%A Ehsan Mohtashami
%A Mehrnoosh Bazrafkan
%T The Classification of Persian Texts with Statistical Approach and Extracting Keywords and Admissible Dataset
%J International Journal of Computer Applications
%@ 0975-8887
%V 101
%N 5
%P 18-20
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In recent years, a lot of algorithms have been proposed for the classification of the documents. Most of works done have been on English language and recently there have been works on some languages such as Chinese, Arabic, etc. In some cases, there were classifications on the Persian texts which have become essays or online projects. One of the algorithms that have been used most frequently in text Classification is KNN algorithm which is more frequently in the texts Classification in the English language. In order to use these algorithms we need suitable dataset of Persian texts, which unfortunately these data are not available to Persian Texts Classification . So the our first and second phase in this project are extracting the keywords and creating Admissible dataset for the classification of the Persian texts, and The third phase of this project is to implementing a software for the classification of the Persian texts using the extracted keywords. In this essay, we have reviewed and paid attention to some challenges of searching and classifying the Persian texts, and we have also implemented an application in order to extract the admissible dataset for the classification of the Persian texts with statistical approach or with KNN and N-gram and etc, which produces some suitable and usable dataset for the classification of the Persian texts. In the last phase we have also implemented an application in order to classify the Persian texts with a statistical approach.

References
  1. Andrew Roberts , January 2009, Grammatical Inference and Corpus Linguistics , Submitted in accordance with the requirements for the degree of Master of Philosophy, The University of Leeds School of Computing.
  2. Shahla Nemati, Mohammad Ehsan Basiri, Persian documents classification using KNN algorithm.
  3. The HAMSHAHRI collection of Tehran University: www. ece. ut. ac. ir / dbrg / hamshahri
  4. The Website of Noor text mining group is : www. textmining. noorsoft. org
Index Terms

Computer Science
Information Sciences

Keywords

Persian Texts Classification Statistical Classification extracting Persian texts keywords extracting dataset