CFP last date
20 May 2024
Reseach Article

An Optimum Model for the Retrieval of Missing Values for Data Cleansing using Regression Analysis

by Deepshikha Aggarwal, V. B. Aggarwal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 117 - Number 2
Year of Publication: 2015
Authors: Deepshikha Aggarwal, V. B. Aggarwal
10.5120/20529-2869

Deepshikha Aggarwal, V. B. Aggarwal . An Optimum Model for the Retrieval of Missing Values for Data Cleansing using Regression Analysis. International Journal of Computer Applications. 117, 2 ( May 2015), 35-39. DOI=10.5120/20529-2869

@article{ 10.5120/20529-2869,
author = { Deepshikha Aggarwal, V. B. Aggarwal },
title = { An Optimum Model for the Retrieval of Missing Values for Data Cleansing using Regression Analysis },
journal = { International Journal of Computer Applications },
issue_date = { May 2015 },
volume = { 117 },
number = { 2 },
month = { May },
year = { 2015 },
issn = { 0975-8887 },
pages = { 35-39 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume117/number2/20529-2869/ },
doi = { 10.5120/20529-2869 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:58:17.564998+05:30
%A Deepshikha Aggarwal
%A V. B. Aggarwal
%T An Optimum Model for the Retrieval of Missing Values for Data Cleansing using Regression Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 117
%N 2
%P 35-39
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

An important aspect of the data mining is the pre-processing of the data. Pre-processing of the data is important because real world data is susceptible to inconsistencies, noise and missing values. Such a data cannot be used in data mining as that would produce highly inadequate results . There are basically two methods through which we can remove the problem of the missing values the first one is to ignore the data set with the missing value the second one is to predict those values. Prediction can be made based on assuming the continuity of the data or giving them some suitable value based on previous knowledge . In this paper our focus is on providing an adequate method to fill those missing values by predicting a suitable value by comparing and choosing a suitable regression method based on both the statistical and the subjective analysis of the graph from the various known regression method.

References
  1. "A Data Cleaning Method Based on Association Rules" by Weijie Wei, Mingwei Zhang, Bin Zhang, www. atlantis-press. com
  2. "Data Cleansing for Web Information Retrieval using Query Independent Features" by Yiqun Liu, Min Zhang, Liyun Ru, Shaoping Ma- www. thuir. cn
  3. "An Extensive Framework for Data Cleaning " by Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon
  4. "A Token-Based Data Cleaning Technique for Data Warehouse" by Timothy E. Ohanekwu International Journal of Data Warehousing and Mining Volume 1
  5. Surajit Chaudhuri Kris Ganjam Venkatesh Ganti Rajeev Motwani, SIGMOD 2003, June 9-12, 2003, San DiegoCA. "Robust and Efficient Fuzzy Match for Online Data Cleaning"
  6. Christie I. Ezeife, Timothy E. Ohanekwu, University of Windsor, Canada, International Journal of Data Warehousing & Mining, 1(2), 1-22, April-June 2005 Research paper titled "Use of Smart Tokens in Cleaning Integrated Warehouse Data"
  7. Ajumobi Udechukwu, Christie Ezeife, Ken Barker Dept. of Computer Science, University of Calgary, Canada School of Computer Science, University of Windsor, Canada, 5th International Conference on Enterprise Information Systems (ICEIS) 2003, Research paper titled "INDEPENDENT DE-DUPLICATION IN DATA CLEANING"
  8. G. Siva Nageswara Rao, Dr. K. Krishna Murthy, Dr. B. V. Subba Rao, Dr. J. Rajendra Prasad, International Journal of Emerging Technology and Advanced Engineering Website: www. ijetae. com (ISSN 2250-2459, Volume 2, Issue 3, March 2012) research paper titled "Removing Inconsistencies and Errors from Original Data Sets through Data Cleansing"
  9. Kazi Shah Nawaz Ripon Department of Informatics, University of Oslo, Norway Computer Science and Engineering Discipline, Khulna University, Bangladesh Ashiqur Rahman and G. M. Atiqur Rahaman Computer Science and Engineering Discipline, Khulna University, Bangladesh, JOURNAL OF COMPUTERS, VOL. 5, NO. 12, DECEMBER 2010 research paper titled "A Domain-Independent Data Cleaning Algorithm for Detecting Similar-Duplicates"
  10. "The role of visualization in effective data cleaning" by Yu Qian, Kang Zhang – Proceedings of 2005 ACM symposium on applied computing
  11. "A Statistical Method for Integrating Data Cleaning and Imputation" by Chris Mayfield, Jennifer Neville, Sunil Prabahakar- Purdue University(Computer Science report-2009)
  12. "Data cleansing based on mathematical morphology" by Sheng Tang published in ICBBE 2008 The second International Conference-2008
  13. "A Domain Independent Data Cleaning Algorithm for detecting similar-duplicates" by Kazi Shah Nawaz Ripon, Ashquir Rahman and G. M. Atiqur Rahaman – Journal of Computer Vol 5, No. 12,2010
  14. P. Pehwa "An Efficient Algorithm for Data Cleaning" www. igiglobal. com -2011.
  15. "Attribute Correction-Data cleaning using Association Rule and Clustering Methods" by R. KavithaKumar, Dr. RM. Chandrasekaran, IJDKP,Vol. 1,No. 2 March-2011.
  16. Random Forest Based Imbalanced Data Cleaning and Classification – Jie Gu –Lamda. nju. edu. cn
  17. Data Cleansing Based on Mathematical Morphology S. Tang-2008 –ieeeexplore. ieee. org. Bioinformatics and Biomedical Engineering , 2008 ICBBE 2008. The 2nd International conference.
  18. "An efficient Algorithm for Data Cleaning of Log File using File Extension" International journal of Computer Applications 48(8):13-18, June-2012 Surabhi Anand , Rinkle Rani Aggarwal.
  19. A New Efficient Data Cleansing Method – Li Zhao, Sung Sam Yuan, Sun Peng and Ling Tok Wang – ftp10. us. freebsd. org
  20. Computer Research and Development (ICCRD), 2011, 3rd International Conference. ", Web log cleaning for mining of web usage patterns" –T. T. Aye.
  21. "Mass Data Cleaning Algorithm based on extended tree-like knowledge base" – Yan Cai-rong,SUN Gui-ning , GAO Nian-gao Computer Engineering and application 2009
Index Terms

Computer Science
Information Sciences

Keywords

Data Quality Missing Values Data Cleaning Regression Linear Quadratic Exponential Gaussian Prediction RMSE.