Call for Paper - May 2023 Edition
IJCA solicits original research papers for the May 2023 Edition. Last date of manuscript submission is April 20, 2023. Read More

Scalable Algorithms for Missing Value Imputation

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 87 - Number 11
Year of Publication: 2014
Marghny H. Mohamed
Abdel-rahiem A. Hashem
Mohammed M. Abdelsamea

Marghny H Mohamed, Abdel-rahiem A Hashem and Mohammed M Abdelsamea. Article: Scalable Algorithms for Missing Value Imputation. International Journal of Computer Applications 87(11):35-42, February 2014. Full text available. BibTeX

	author = {Marghny H. Mohamed and Abdel-rahiem A. Hashem and Mohammed M. Abdelsamea},
	title = {Article: Scalable Algorithms for Missing Value Imputation},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {87},
	number = {11},
	pages = {35-42},
	month = {February},
	note = {Full text available}


Statistical Imputation Techniques have been proposed mainly with the aim of predicting the missing values in the incomplete sets as an essential step in any data analysis framework. K-means-based Imputation, as a representative statistical imputation method, has been producing satisfied results in terms of effectiveness and efficiency in handling popular and freely available data set (e. g. , Bupa, Breast Cancer, Pima, etc. ). The main idea of K-means based methods is to impute the missing value relying on the prototypes of the representative class and the similarity of the data. However, such kinds of methods share the same limitations of the K-means as data mining technique. In this paper and motivated by such drawbacks, we introduce simple and efficient imputation methods based on K-means to deal with the missing data from various classes of data sets. Our proposed methods give higher accuracy than the one given by the standard K-means.


  • Jiawei, H. and Micheline, K. , 2006. Data mining Concept and Techniques. 2nd Edn Morgon Kaufmaan Publishers. ISBN: 1-55860-901-6.
  • Mehala, B. , Vivekanandan K. and Ranjit Jeba Thangaiah, P. , 2008. An Analysis on K-Means Algorithm as an Imputation Method to Deal with Missing Values. Asian Journal of Information Technology 7 (9): 434-441.
  • Lakshminarayan, K. , Harp, S. A. and Samad, T. , 1999. Imputation of missing data in industrial database, Apple. Intell. 11, 259-275.
  • Jau-Huei Lin and Peter J. Haug, 2008. Exploiting missing clini- cal data in Bayesian network modeling for predicting medical problems Journal of Biomedical Informatics 41, 1-4.
  • Alireza farhangfar, Lukase Kurgan and Jennifer Dy, 2008. Impact of imputation of missing values on classification error for discrete data. Pattern Recognition 41, 3692-3705.
  • Dempster, A. P. and LairdandDB Rubin, R. J. , 1977. Maximum likelyhood from incomplete data via the EM algoritm (with Discussion). I. R. Stat. Soc, B39: 1-38. http://wwwjstororg/pss/2984875.
  • Daqian, G. and Yang, G. 2005. Incremental gradent descent imputation method for missing data in learning classifier systems. GECCO, ACM, Wash- ington, DC, USA, pp: 72-73.
  • Fulufhelo, V. , Nelwamondo and Tshlidzi, M. 2007. Rough sets computations to impute missing data. Comput. Vision and Pattern Recog. , 1, 1-19.
  • Musil, C. M. , Wamer, C. B. , Yobas , P. K. and Jones, S. L. 2002. A comparison of imputation techniques for han- dling missing data. Western J. Nus. Res. , 24 (5).
  • Cristian P. , D. , Alain, P. Monique and Tahar, K. 2005. Tools for statistical analysis with missing data: Appli cation to a large medxal database. ENMI, pp: 181-186.
  • Joseph L. Schafer and Maren K. Olsen, 1998. Multiple Imputation for multivariate Missing data problems: a data analyst's perspective, 33, 545--571.
  • Pedro J. Garc Laencina, Jose'-Luis Sancho-Gomez, Anbal R. Figueiras-Vidal and Michel Verleysen, 2009. K nearest neighbours with mutual information for simultaneous Classification and missing data imputation. Neurocomputing 72, 1483-1493.
  • Allison, P. D. , 2001. Missing data, Sage University Papers Serieson Quantitative Applications in the Social Sciences, Thousand Oaks, California, USA.
  • Little, R. J. A. and Rubin, D. B. Statistical 2002. Analysis with Missing Data, seconded, Wiley, NJ, USA.
  • Sande, G. 1983. Hot Deck Imputation Procedures, Incomplete data in Sample Surveys, vol. 3, Academic Press.