CFP last date
22 April 2024
Call for Paper
May Edition
IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper
Know more
Reseach Article

ETL based Cleaning on Database

by Arup Kumar Bhattacharjee, Partha Chatterjee, Mukesh Prasad Shaw, Manomoy Chakraborty
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 105 - Number 8
Year of Publication: 2014
Authors: Arup Kumar Bhattacharjee, Partha Chatterjee, Mukesh Prasad Shaw, Manomoy Chakraborty
10.5120/18399-9661

Arup Kumar Bhattacharjee, Partha Chatterjee, Mukesh Prasad Shaw, Manomoy Chakraborty . ETL based Cleaning on Database. International Journal of Computer Applications. 105, 8 ( November 2014), 34-40. DOI=10.5120/18399-9661

@article{ 10.5120/18399-9661,
author = { Arup Kumar Bhattacharjee, Partha Chatterjee, Mukesh Prasad Shaw, Manomoy Chakraborty },
title = { ETL based Cleaning on Database },
journal = { International Journal of Computer Applications },
issue_date = { November 2014 },
volume = { 105 },
number = { 8 },
month = { November },
year = { 2014 },
issn = { 0975-8887 },
pages = { 34-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume105/number8/18399-9661/ },
doi = { 10.5120/18399-9661 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:37:11.866522+05:30
%A Arup Kumar Bhattacharjee
%A Partha Chatterjee
%A Mukesh Prasad Shaw
%A Manomoy Chakraborty
%T ETL based Cleaning on Database
%J International Journal of Computer Applications
%@ 0975-8887
%V 105
%N 8
%P 34-40
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The paper analyses the problem of data cleaning and automatically identifying the "incorrect and inconsistent data" in the dataset. Extraction, Transformation and Loading (ETL) are the different steps for cleaning a data warehouse. Authors have implemented different algorithms like: cleanString, cleanNumber, hit ratio, check data dictionary, check metadata etc in addition to various existing data cleaning algorithm like PNRS. This paper tries is to improve the quality of data in the database system. This paper emphasizes on the citizen database system to make it errorless. Some of the results along with certain statistics are also provided here.

References
  1. Arup Kumar Bhattacharjee, Atanu Mallick, Arnab Dey and Sananda Bandyopadhyay, "Data Cleaning in Text File", Dept. of MCA, RCC Institute of Information Technology, India.
  2. R. Cody, "Data cleaning 101," Proceedings for the Twenty-Seventh SAS User Group International Conference. Cary, NC: SAS Institute Inc, 2000.
  3. Dr. Mortadha M. Hamad and Alaa Abdulkhar Jihad, "An Enhanced Technique to Clean Data in the Data Warehouse". Computer Science Department. University of Anbar, Ramadi, Iraq.
  4. Hasimah Hj Mohamed, Tee Leong Kheng, Chee Collin and Ong Siong Lee, "E-Clean: A Data Cleaning Framework for Patient Data". School of Computer Sciences. University Sains Malaysia Penang, Malaysia.
  5. Arindam Paul, Varuni Ganesan, Jagat Sesh Challa and Yashvardhan Sharma, "HADCLEAN: A Hybrid Approach to Data Cleaning in Data Warehouses". Department of Computer Science & Information Systems . Birla Institute of Technology & Science, Pilani, Rajasthan, India – 333031.
  6. Erhard Rahm and Hong Hai Do. "Data Cleaning Problems and Current Approaches". University of Leipzig, Germany.
  7. Srivatsa Maddodi, Girija V. Attigeri and Dr. Karunakar A. K, "Data Deduplication Techniques and Analysis". Manipal Institute of Technology, Manipal, India.
  8. R. Kimball and J. Caserta, "The Data Warehouse ETL Toolkit". Wiley, 2004.
  9. Cleaning the Spurious Links in Data -Mong Li Lee, Wynne Hsu, and Vijay Kothari NationalUniversity of Singapore.
  10. An Important Issue in Data Mining-Data Cleaning-Qi Xiao Yang Institute of High Performance of Computing Sung Sam Yuan, LuChun School of Computing National University of Singapore, Jay Rajasekera Graduate School of International Management International University of Japan.
  11. Generic and Declarative Approaches to Data Cleaning : Some Recent Developments – Leopoldo Bertossi and Loreto Bravo.
  12. Conditional Functional Dependencies for Data Cleaning – Philip Bohannon from Yahoo! Research, Wenfei Fan from Bell Laboratories,Floris Geerts from University of Edinburgh,Xibei Jia from University of Edinburgh,Anastasios Kementsietsidis from Hasselt University/Transnational university Limburg.
  13. A Study over Problems and Approaches of Data Cleansing/Cleaning by Nidhi Chowdhury, dept. of CS,UPTU,India.
  14. NADEEF: A Commodity Data Cleansing System Michele Dallachiesa,Amr Ebaid,Ahmed Eldawy,Ahmed Elmagarmid,Ihab F. Llyas,Mourad Ouzzani,Nan Tang, OCRI,University of Trento,Purdue University,University of Minnesota.
Index Terms

Computer Science
Information Sciences

Keywords

Data warehouse ETL Data Dictionary Hit Ratio Dirty Data Data Cleaning.