CFP last date
22 April 2024
Reseach Article

Scalable Algorithms for Missing Value Imputation

by Marghny H. Mohamed, Abdel-rahiem A. Hashem, Mohammed M. Abdelsamea
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 87 - Number 11
Year of Publication: 2014
Authors: Marghny H. Mohamed, Abdel-rahiem A. Hashem, Mohammed M. Abdelsamea
10.5120/15255-4019

Marghny H. Mohamed, Abdel-rahiem A. Hashem, Mohammed M. Abdelsamea . Scalable Algorithms for Missing Value Imputation. International Journal of Computer Applications. 87, 11 ( February 2014), 35-42. DOI=10.5120/15255-4019

@article{ 10.5120/15255-4019,
author = { Marghny H. Mohamed, Abdel-rahiem A. Hashem, Mohammed M. Abdelsamea },
title = { Scalable Algorithms for Missing Value Imputation },
journal = { International Journal of Computer Applications },
issue_date = { February 2014 },
volume = { 87 },
number = { 11 },
month = { February },
year = { 2014 },
issn = { 0975-8887 },
pages = { 35-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume87/number11/15255-4019/ },
doi = { 10.5120/15255-4019 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:05:40.760144+05:30
%A Marghny H. Mohamed
%A Abdel-rahiem A. Hashem
%A Mohammed M. Abdelsamea
%T Scalable Algorithms for Missing Value Imputation
%J International Journal of Computer Applications
%@ 0975-8887
%V 87
%N 11
%P 35-42
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Statistical Imputation Techniques have been proposed mainly with the aim of predicting the missing values in the incomplete sets as an essential step in any data analysis framework. K-means-based Imputation, as a representative statistical imputation method, has been producing satisfied results in terms of effectiveness and efficiency in handling popular and freely available data set (e. g. , Bupa, Breast Cancer, Pima, etc. ). The main idea of K-means based methods is to impute the missing value relying on the prototypes of the representative class and the similarity of the data. However, such kinds of methods share the same limitations of the K-means as data mining technique. In this paper and motivated by such drawbacks, we introduce simple and efficient imputation methods based on K-means to deal with the missing data from various classes of data sets. Our proposed methods give higher accuracy than the one given by the standard K-means.

References
  1. Jiawei, H. and Micheline, K. , 2006. Data mining Concept and Techniques. 2nd Edn Morgon Kaufmaan Publishers. ISBN: 1-55860-901-6.
  2. Mehala, B. , Vivekanandan K. and Ranjit Jeba Thangaiah, P. , 2008. An Analysis on K-Means Algorithm as an Imputation Method to Deal with Missing Values. Asian Journal of Information Technology 7 (9): 434-441.
  3. Lakshminarayan, K. , Harp, S. A. and Samad, T. , 1999. Imputation of missing data in industrial database, Apple. Intell. 11, 259-275.
  4. Jau-Huei Lin and Peter J. Haug, 2008. Exploiting missing clini- cal data in Bayesian network modeling for predicting medical problems Journal of Biomedical Informatics 41, 1-4.
  5. Alireza farhangfar, Lukase Kurgan and Jennifer Dy, 2008. Impact of imputation of missing values on classification error for discrete data. Pattern Recognition 41, 3692-3705.
  6. Dempster, A. P. and LairdandDB Rubin, R. J. , 1977. Maximum likelyhood from incomplete data via the EM algoritm (with Discussion). I. R. Stat. Soc, B39: 1-38. http://wwwjstororg/pss/2984875.
  7. Daqian, G. and Yang, G. 2005. Incremental gradent descent imputation method for missing data in learning classifier systems. GECCO, ACM, Wash- ington, DC, USA, pp: 72-73.
  8. Fulufhelo, V. , Nelwamondo and Tshlidzi, M. 2007. Rough sets computations to impute missing data. Comput. Vision and Pattern Recog. , 1, 1-19.
  9. Musil, C. M. , Wamer, C. B. , Yobas , P. K. and Jones, S. L. 2002. A comparison of imputation techniques for han- dling missing data. Western J. Nus. Res. , 24 (5).
  10. Cristian P. , D. , Alain, P. Monique and Tahar, K. 2005. Tools for statistical analysis with missing data: Appli cation to a large medxal database. ENMI, pp: 181-186.
  11. Joseph L. Schafer and Maren K. Olsen, 1998. Multiple Imputation for multivariate Missing data problems: a data analyst's perspective, 33, 545--571.
  12. Pedro J. Garc Laencina, Jose'-Luis Sancho-Gomez, Anbal R. Figueiras-Vidal and Michel Verleysen, 2009. K nearest neighbours with mutual information for simultaneous Classification and missing data imputation. Neurocomputing 72, 1483-1493.
  13. Allison, P. D. , 2001. Missing data, Sage University Papers Serieson Quantitative Applications in the Social Sciences, Thousand Oaks, California, USA.
  14. Little, R. J. A. and Rubin, D. B. Statistical 2002. Analysis with Missing Data, seconded, Wiley, NJ, USA.
  15. Sande, G. 1983. Hot Deck Imputation Procedures, Incomplete data in Sample Surveys, vol. 3, Academic Press.
Index Terms

Computer Science
Information Sciences

Keywords

Statistical Imputation Clustering K-mean