CFP last date
22 April 2024
Reseach Article

Automated ETL Testing on the Data Quality of a Data Warehouse

by Sara B. Dakrory, Tarek M. Mahmoud, Abdelmgeid A. Ali
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 131 - Number 16
Year of Publication: 2015
Authors: Sara B. Dakrory, Tarek M. Mahmoud, Abdelmgeid A. Ali
10.5120/ijca2015907590

Sara B. Dakrory, Tarek M. Mahmoud, Abdelmgeid A. Ali . Automated ETL Testing on the Data Quality of a Data Warehouse. International Journal of Computer Applications. 131, 16 ( December 2015), 9-16. DOI=10.5120/ijca2015907590

@article{ 10.5120/ijca2015907590,
author = { Sara B. Dakrory, Tarek M. Mahmoud, Abdelmgeid A. Ali },
title = { Automated ETL Testing on the Data Quality of a Data Warehouse },
journal = { International Journal of Computer Applications },
issue_date = { December 2015 },
volume = { 131 },
number = { 16 },
month = { December },
year = { 2015 },
issn = { 0975-8887 },
pages = { 9-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume131/number16/23532-2015907590/ },
doi = { 10.5120/ijca2015907590 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:27:33.384705+05:30
%A Sara B. Dakrory
%A Tarek M. Mahmoud
%A Abdelmgeid A. Ali
%T Automated ETL Testing on the Data Quality of a Data Warehouse
%J International Journal of Computer Applications
%@ 0975-8887
%V 131
%N 16
%P 9-16
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Testing ETL (Extract, Transform, and Load) procedures is an important and vital phase during testing Data warehouse (DW); it’s almost the most complex phase, because it directly affects the quality of data. It has been proved that automated testing is valuable tool to improve the quality of DW systems while the manual testing process is time consuming and not accurate so automating tests improves Data Quality (DQ) in less time, cost and attaining good data quality. In this paper the author’s propose testing framework to automate testing data quality at the stage of ETL process. Different datasets with different volumes (stared from 10,000 records till 50,000 records) are used to evaluate the effectiveness of the proposed automated ETL testing. The conducted experimental results showed that the proposed testing framework is effective in detecting errors with the different data volumes.

References
  1. Rainardi, V. Testing your Data Warehouse. in Building a Data Warehouse with Examples in SQL Server, Apress, 2008.
  2. English, L. P. (1999). Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits, John Wiley and Sons, Inc.Data Quality Issues.
  3. Golfarelli, M. and Rizzi, S. Data Warehouse Testing:A prototype-based methodology. Information and Software Technology, 53 (11). 1183-1198.
  4. Jarke, M., Jeusfeld, M. A., Quix, C., and Vassiliadis, P. (1999). "Architecture and Quality in Data Warehouses: An Extended Repository Approach." Information Systems,24(3), 229-253.
  5. Askham, N., and Cook, D. (2013). "Defining Data Quality Dimensions: The six Primary Dimensions for Data Quality Assessment." Enterprise Data and BI Conference,London, UK.
  6. Singh R, and Singh K. (2010). A Descriptive Classification of Causes of Data Quality Problems in Data Warehousing. International Journal of Computer Science Issues (IJCSI). 7(4).
  7. Singh R, and Singh K. (2009). Statistically analyzing the Impact of automated ETl Testing on the Data Quality of a Data Warehouse, International Journal of Computer and Electrical Engineering, Vol. 1, No. 4.
  8. Gill R., Singh J., 2014, A Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment, Available at www.chitkara.edu.in/publications.
  9. Wayne W. E., 2004, “Data Quality and the Bottom Line: Achieving Business Success through a Commitment to High Quality Data “,The Data warehouse Institute (TDWI) report ,available at www.dw-institute.com .
  10. Kimball R. and Caserta J., 2004, The Data Warehouse ETL Toolkit. John Wiley & Sons.
  11. Manjunath T.N, Ravindra S Hegadi, Ravikumar G K. "Analysis of Data Quality Aspects in Datawarehouse Systems", (IJCSIT)-Jan-2011.
  12. Rodic, J. and Baranovic, M., Generating Data Quality Rules and Integration into ETL Process, Proceeding of DO- LAP’09, Hong Kong, 2009, pp. 65-72.
  13. Tanuška, P., Pavel, V. and Peter, S., The Partial Proposal of Data Warehouse Testing Task. 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009)
  14. Vucevic, D., and Yaddow, W. (2012). Testing the Data Warehouse Practicum- Assuring Data Content, Data Structures and Quality, Trafford
  15. Available at: http://www.teradata.com/tools-and-utilities/meta-data-services. Last access, Oct, 2015
  16. Available at: http://www.metaintegration.net/Solution. Last access, Oct, 2015.
  17. Available at: http://pragmaticworks.com. Last access, Oct, 2015.
Index Terms

Computer Science
Information Sciences

Keywords

Automated ETL Testing Data Quality Data Warehouse Data Quality checking Routines.