Improving the Classification accuracy of Noisy Dataset by Effective Data Preprocessing

K. V. Uma

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

RackOps: Software Architecture and Automation Patterns for Large-Scale Server Rack Validation

Gopimahesh Vatram

Random Articles

Big Data Analysis with Dataset Scaling in Yet Another Resource Negotiator (YARN)

April

2014

Fuzzy based Probability Factor Calculation for Number of Cluster Estimation to K-Mean by using Apriori

March

2015

Comparison of various Security Protocols in RFID

June

2011

Code and Performance-based Metrics for Multithreaded Object-Oriented Software

Jan

2025

Reseach Article

Improving the Classification accuracy of Noisy Dataset by Effective Data Preprocessing

by K. V. Uma

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 180 - Number 36

Year of Publication: 2018

Authors: K. V. Uma

10.5120/ijca2018916908

K. V. Uma . Improving the Classification accuracy of Noisy Dataset by Effective Data Preprocessing. International Journal of Computer Applications. 180, 36 ( Apr 2018), 37-46. DOI=10.5120/ijca2018916908

@article{ 10.5120/ijca2018916908,

author = { K. V. Uma },

title = { Improving the Classification accuracy of Noisy Dataset by Effective Data Preprocessing },

journal = { International Journal of Computer Applications },

issue_date = { Apr 2018 },

volume = { 180 },

number = { 36 },

month = { Apr },

year = { 2018 },

issn = { 0975-8887 },

pages = { 37-46 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume180/number36/29302-2018916908/ },

doi = { 10.5120/ijca2018916908 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:02:52.875987+05:30

%A K. V. Uma

%T Improving the Classification accuracy of Noisy Dataset by Effective Data Preprocessing

%J International Journal of Computer Applications

%@ 0975-8887

%V 180

%N 36

%P 37-46

%D 2018

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Decision tree is a technique commonly used in data mining. Issues in decision tree algorithms are working with continuous attributes and missing values, avoiding over fitting, super attributes. Handling noisy data is the challenging factor in data mining research. Noisy data is meaningless data. It unnecessarily increases the amount of storage space required and can also adversely affect the results of any data mining analysis. Predicting the result from such noisy data is the complicated factor. The commonly used algorithm for classification problems are decision stump, ensemble models, SVM, and decision tree algorithms. The performance of the algorithm resulted in lower accuracy when comparing with the noiseless data result. Thus in this paper, data is collected and noise is added to the data, and then it is preprocessed for handling missing values. The preprocessed data is then provided as the input for the feature selection technique. Most relevant features are selected using correlation based subset feature selection technique. The selected features are provided as the input of Credal C4.5 algorithm and decision tree is constructed. The result is analyzed with various data with (5,10,20,30)% noise level. This technique improves the performance of the algorithm with (1-5)% improvement in accuracy compared to the existing result.

References

Jose A. Saez, Mikel Galar, Julian Luengo and Francisco Herrera.2013.Tackling the problem of classification with noisy data using Multiple Classifier Systems: Analysis of the performance and robustness. Information Sciences.
Carlos J. Mantasand JoaquinAbellan.2014.Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data. Expert Systems with Applications, 4625–4637.
Joaquin Abellan and Javier G. Castellano.2017.A comparative study on base classifiers in ensemble methods for credit scoring, Expert Systems with Applications, 1–10.
Yisen Wang, Shu-Tao Xia and JiaWu .2016.A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification. Knowledge-Based Systems, 1–9.
Dewan Md. Farid, Mohammad Abdullah Al-Mamun and Bernard Manderick, Ann Nowe.2016.An adaptive rule-based classifier for mining big biological data. Expert Systems With Applications, 64, 305–316.
FarhadPourpanah, CheePeng Limb and Junita MohamadSaleh. 2015.A hybrid model of fuzzy ARTMAP and genetic algorithm for data classification and rule extraction. Expert Systems With Applications .
AbeerM.Mahmoud.2016.Suitability of Various Intelligent Tree Based Classifiers for Diagnosing Noisy Medical Data. Egyptian Computer Science Journal Vol. 40 No.2 .
Hong Zhao and Xiangju Li. 2016.A cost-sensitive decision tree algorithm based on weighted class distribution with batch deleting attribute mechanism. Information Sciences, 1–14 .
Carlos J. Mantas, JoaquinAbellan and Javier G. Castellano.2016.Analysis of Credal-C4.5 for classification in noisy domains. Expert Systems With Applications, 61, 314–326
Moloud Abdar , Mariam Zomorodi-Moghadam , Resul Das and I-Hsien Ting.2016.Performance analysis of classification algorithms on early detection of Liver disease. Expert Systems With Applications.
Jinghua Liu, Yaojin Lin, Menglei Lin, Shunxiang Wu and JiaZhang.2016.Feature selection based on quality of information. Neurocomputing.
Jose A. Saez, Mikel Galar, Julian Luengo and Francisco Herrera.2013.Tackling the problem of classification with noisy data using Multiple Classifier Systems: Analysis of the performance and robustness. Information Sciences.
Abeer M.Mahmoud.2016.Suitability of Various Intelligent Tree Based Classifiers for Diagnosing Noisy Medical Data. Egyptian Computer Science Journal Vol. 40 No.2.
Hong Zhao and Xiangju Li.2016.A cost-sensitive decision tree algorithm based on weighted class distribution with batch deleting attribute mechanism. Information Sciences,1–14 .

Index Terms

Computer Science

Information Sciences

Keywords

Classification Noisy Data Feature Selection Data Preprocessing.