The Classification of Persian Texts with Statistical Approach and Extracting Keywords and Admissible Dataset

Ehsan Mohtashami; Mehrnoosh Bazrafkan

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Encryption Approaches for Secure Deduplication in Cloud Environment

Dec

2016

An Implementation of Secure Wireless Network for Avoiding Black hole Attack

February

2015

MHD Convection Slip Fluid Flow With Radiation and Heat Deposition in a Channel in a Porous Medium

December

2011

FMEA and Alternatives v/s Enhanced Risk Assessment Mechanism

May

2014

Reseach Article

The Classification of Persian Texts with Statistical Approach and Extracting Keywords and Admissible Dataset

by Ehsan Mohtashami, Mehrnoosh Bazrafkan

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 101 - Number 5

Year of Publication: 2014

Authors: Ehsan Mohtashami, Mehrnoosh Bazrafkan

10.5120/17683-8541

Ehsan Mohtashami, Mehrnoosh Bazrafkan . The Classification of Persian Texts with Statistical Approach and Extracting Keywords and Admissible Dataset. International Journal of Computer Applications. 101, 5 ( September 2014), 18-20. DOI=10.5120/17683-8541

@article{ 10.5120/17683-8541,

author = { Ehsan Mohtashami, Mehrnoosh Bazrafkan },

title = { The Classification of Persian Texts with Statistical Approach and Extracting Keywords and Admissible Dataset },

journal = { International Journal of Computer Applications },

issue_date = { September 2014 },

volume = { 101 },

number = { 5 },

month = { September },

year = { 2014 },

issn = { 0975-8887 },

pages = { 18-20 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume101/number5/17683-8541/ },

doi = { 10.5120/17683-8541 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:30:54.181631+05:30

%A Ehsan Mohtashami

%A Mehrnoosh Bazrafkan

%T The Classification of Persian Texts with Statistical Approach and Extracting Keywords and Admissible Dataset

%J International Journal of Computer Applications

%@ 0975-8887

%V 101

%N 5

%P 18-20

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In recent years, a lot of algorithms have been proposed for the classification of the documents. Most of works done have been on English language and recently there have been works on some languages such as Chinese, Arabic, etc. In some cases, there were classifications on the Persian texts which have become essays or online projects. One of the algorithms that have been used most frequently in text Classification is KNN algorithm which is more frequently in the texts Classification in the English language. In order to use these algorithms we need suitable dataset of Persian texts, which unfortunately these data are not available to Persian Texts Classification . So the our first and second phase in this project are extracting the keywords and creating Admissible dataset for the classification of the Persian texts, and The third phase of this project is to implementing a software for the classification of the Persian texts using the extracted keywords. In this essay, we have reviewed and paid attention to some challenges of searching and classifying the Persian texts, and we have also implemented an application in order to extract the admissible dataset for the classification of the Persian texts with statistical approach or with KNN and N-gram and etc, which produces some suitable and usable dataset for the classification of the Persian texts. In the last phase we have also implemented an application in order to classify the Persian texts with a statistical approach.

References

Andrew Roberts , January 2009, Grammatical Inference and Corpus Linguistics , Submitted in accordance with the requirements for the degree of Master of Philosophy, The University of Leeds School of Computing.
Shahla Nemati, Mohammad Ehsan Basiri, Persian documents classification using KNN algorithm.
The HAMSHAHRI collection of Tehran University: www. ece. ut. ac. ir / dbrg / hamshahri
The Website of Noor text mining group is : www. textmining. noorsoft. org

Index Terms

Computer Science

Information Sciences

Keywords

Persian Texts Classification Statistical Classification extracting Persian texts keywords extracting dataset