Checking the Correctness of Bangla Words using N-Gram

Nur Hossain Khan; Gonesh Chandra Saha; Bappa Sarker; Md. Habibur Rahman

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

Quantifying Label-Induced Bias in Large Language Model Self and Cross Evaluations

Muskan Saraf Sajjad Rezvani Boroujeni Justin Beaudry Hossein Abedi Tom Bush

Random Articles

On Chain Folding Problems of Chain Mapper and Chain Reducer Meta Expressions

April

2015

A Supervised Approach to Zero-Shot Learning for Field Classification of Texts: Leveraging File Data for Improved Text Categorization

Sep

2024

Optimized kNN Query Processing using Clustering in Untrusted Cloud Environment

April

2015

Development of an Instrument for Enterprise Resource Planning (ERP) Implementation in Indian Small and Medium Enterprises (SMEs)

July

2012

Reseach Article

Checking the Correctness of Bangla Words using N-Gram

by Nur Hossain Khan, Gonesh Chandra Saha, Bappa Sarker, Md. Habibur Rahman

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 89 - Number 11

Year of Publication: 2014

Authors: Nur Hossain Khan, Gonesh Chandra Saha, Bappa Sarker, Md. Habibur Rahman

10.5120/15672-4416

Nur Hossain Khan, Gonesh Chandra Saha, Bappa Sarker, Md. Habibur Rahman . Checking the Correctness of Bangla Words using N-Gram. International Journal of Computer Applications. 89, 11 ( March 2014), 1-3. DOI=10.5120/15672-4416

@article{ 10.5120/15672-4416,

author = { Nur Hossain Khan, Gonesh Chandra Saha, Bappa Sarker, Md. Habibur Rahman },

title = { Checking the Correctness of Bangla Words using N-Gram },

journal = { International Journal of Computer Applications },

issue_date = { March 2014 },

volume = { 89 },

number = { 11 },

month = { March },

year = { 2014 },

issn = { 0975-8887 },

pages = { 1-3 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume89/number11/15672-4416/ },

doi = { 10.5120/15672-4416 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:08:55.915684+05:30

%A Nur Hossain Khan

%A Gonesh Chandra Saha

%A Bappa Sarker

%A Md. Habibur Rahman

%T Checking the Correctness of Bangla Words using N-Gram

%J International Journal of Computer Applications

%@ 0975-8887

%V 89

%N 11

%P 1-3

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

N-gram model is used in many domains like spelling and syntactic verification, speech recognition, machine translation, character recognition and like others. This paper describes a system for checking the correctness of a bangle word using N-gram model. An experimental corpus containing one million word tokens was used to train the system. The corpus was a part of the BdNC01 corpus, created in the SIPL lab. of Islamic university. Collecting several sample text from different newspapers, the system was tested by 50,000 correct and another 50,000 incorrect words. The system has successfully detected the correctness of the test words at a rate of 96. 17%. This paper also describes the limitations of the system with possible solutions.

References

P Majumder, M Mitra, B. B. Chaudhuri, "N-gram: a language independent approach to IR and NLP", ICUKL November 2002, Goa, India.
Wikipedia, "n-gram", http://en. wikipedia. org/wiki/N-gram, Access date: 17th Dec. 2013.
Daniel Jurafsky, James H. Martin,"Speech and Language Processing An Introduction to Natural Language Processing: Computational Linguistics and Speech Recognition", Prentice Hall, Englewood Cliffs, New Jersey 07632 , September 28, 1999
C. E. Shannon, "Prediction and entropy of printed English," Bell Sys. Tec. J. (30):50–64, 1951
Farag Ahmed, Ernesto William De Luca, and Andreas Nürnberger, "Revised N-Gram based Automatic Spelling Correction Tool to Improve Retrieval Effectiveness", August 22, 2009
Hasan Muaidi, Rasha Al-Tarawneh, "Towards Arabic Spell-Checker Based on N-Grams Scores", International Journal of Computer Applications (0975 -8887), Volume 53 - No. 3, September 2012.

Index Terms

Computer Science

Information Sciences

Keywords

N-gram Tokens Corpus Witten-Bell smoothing