CFP last date
21 October 2024
Reseach Article

An Efficient Approach for Filling Incomplete Data

Published on May 2012 by P. M. Kiran, A. Prakash Rao, B. Ratnamala
National Conference on Advances in Computer Science and Applications (NCACSA 2012)
Foundation of Computer Science USA
NCACSA - Number 4
May 2012
Authors: P. M. Kiran, A. Prakash Rao, B. Ratnamala
554fb9cc-3abf-4462-961a-2567c02bde2f

P. M. Kiran, A. Prakash Rao, B. Ratnamala . An Efficient Approach for Filling Incomplete Data. National Conference on Advances in Computer Science and Applications (NCACSA 2012). NCACSA, 4 (May 2012), 23-27.

@article{
author = { P. M. Kiran, A. Prakash Rao, B. Ratnamala },
title = { An Efficient Approach for Filling Incomplete Data },
journal = { National Conference on Advances in Computer Science and Applications (NCACSA 2012) },
issue_date = { May 2012 },
volume = { NCACSA },
number = { 4 },
month = { May },
year = { 2012 },
issn = 0975-8887,
pages = { 23-27 },
numpages = 5,
url = { /proceedings/ncacsa/number4/6503-1028/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Advances in Computer Science and Applications (NCACSA 2012)
%A P. M. Kiran
%A A. Prakash Rao
%A B. Ratnamala
%T An Efficient Approach for Filling Incomplete Data
%J National Conference on Advances in Computer Science and Applications (NCACSA 2012)
%@ 0975-8887
%V NCACSA
%N 4
%P 23-27
%D 2012
%I International Journal of Computer Applications
Abstract

Good data preparation is a key prerequisite to successful data mining. Conventional wisdom suggests that data preparation takes about 60 to 80% of the time involved in a data mining exercise. There have been good reviews of the problems associated with data preparation. However the data preprocessing is a crucial step used for variety of data warehousing and mining. Real world data is noisy and can often suffer from corruptions or incomplete values that may impact the models created from the data. Accuracy of any mining algorithm greatly depends on the input datasets. In this paper we describe a novel idea of predicting the missing values in the dataset by a well known principle of Maximum likelihood EM (Expectation Maximization). After doing implementing and applying the EM filter, the dataset is completed with the estimated values, based on the well known principle of expected maximization of attribute instance. We demonstrate the efficacy of the approach on real data sets as a preprocessing step.

References
  1. Sameer S. Prabhune, Dr. S. R. Sathe "Reconstruction of a Complete Dataset from an IncompleteDataset by Expectation Maximization Technique", International Journal of Computer Science and Network Security, VOL. 10 No. 11, November 2010
  2. Data Preparation for Data Mining, D Pyle, 1999, Morgan Kaufmann Inc. , ISBN 1-55860-529-0.
  3. S. Parthsarthy and C. C. Aggarwal, "On the Use of Conceptual Reconstruction for Mining Massively Incomplete Data Sets, "IEEE Trans. Knowledge and Data Eng. , pp. 1512-1521,2003.
  4. J. Quinlan, C4. 5: Programs for Machine Learning, San Mateo, Calif. : Morgan Kaufmann, 1993.
  5. S. Mehta,S. Parthsarthy and H. Yang " Toward Unsupervised correlation preserving discretization", IEEE Trans. Knowledge and Data Eng. ,pp 1174- 1185 ,2005.
  6. Ian H. Witten and Eibe Frank , "Data Mining: Practical Machine Learning Tools and Techniques" Second Edition, Morgan Kaufmann Publishers. ISBN:81-312-0050-
  7. R. Little, D. Rubin. Statastical Analysis with Missing Data. Ch. 8 , pp 164-172,Wiley Series in Probability and Statistics, 2002.
  8. UCI Machine Learning Repository,
  9. Jiawei Han and Micheline Kamber "Data Mining Concepts and techniques "
  10. M. richardson and P. Domingos. Mining Knowledge –sharing sites for viral marketing.
  11. Data Mining Leading Edge: Insurance & Banking, D Romano in Proceedings of Knowledge Discovery and Data Mining, Unicom, Brunel University, 1997.
  12. Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem, M A Hernandez and S J Stolfo, Data Mining and Knowledge Discovery 2,p1-31, 1998.
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining Data Preprocessing Missing Data