CFP last date
22 April 2024
Reseach Article

Semantic Integrity Constraint Rule Discovery and Outlier Detection in Relational Data as a Data Quality Mining Technique

by R. Vasanth Kumar Mehta, S. Rajalakshmi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 88 - Number 6
Year of Publication: 2014
Authors: R. Vasanth Kumar Mehta, S. Rajalakshmi
10.5120/15357-3819

R. Vasanth Kumar Mehta, S. Rajalakshmi . Semantic Integrity Constraint Rule Discovery and Outlier Detection in Relational Data as a Data Quality Mining Technique. International Journal of Computer Applications. 88, 6 ( February 2014), 23-26. DOI=10.5120/15357-3819

@article{ 10.5120/15357-3819,
author = { R. Vasanth Kumar Mehta, S. Rajalakshmi },
title = { Semantic Integrity Constraint Rule Discovery and Outlier Detection in Relational Data as a Data Quality Mining Technique },
journal = { International Journal of Computer Applications },
issue_date = { February 2014 },
volume = { 88 },
number = { 6 },
month = { February },
year = { 2014 },
issn = { 0975-8887 },
pages = { 23-26 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume88/number6/15357-3819/ },
doi = { 10.5120/15357-3819 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:06:55.696573+05:30
%A R. Vasanth Kumar Mehta
%A S. Rajalakshmi
%T Semantic Integrity Constraint Rule Discovery and Outlier Detection in Relational Data as a Data Quality Mining Technique
%J International Journal of Computer Applications
%@ 0975-8887
%V 88
%N 6
%P 23-26
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data Quality is critical to the quality of patterns and analysis obtained from data. One of the important factors plaguing data is violation of Semantic Integrity, leading to inconsistency, in turn resulting in generation of bad patterns or reports when data mining or warehousing techniques are applied on such data. In this paper, a data quality mining technique is proposed to automatically generate Semantic Integrity Constraint Rules from the data. Further, this process leads to identification of Outliers, which are then to be classified as either violations or genuine cases of exception. The results of applying the proposed technique on a real-life data set are discussed. Some other data quality-related observations made in the process are listed.

References
  1. Rahm, Erhard, and Hong Hai Do. "Data cleaning: Problems and current approaches. " IEEE Data Eng. Bull. 23. 4 (2000): 3-13.
  2. Oliveira, Paulo ; Rodrigues, Fátima ; Henriques, Pedro Rangel ; Naumann, Felix (Bearb. ) ; Gertz, Michael (Bearb. ) ; Madnick, Stuart E. (Bearb. ): A Formal Definition of Data Quality Problems. . In: IQ :MIT, 2005
  3. Kim, Won, et al. "A taxonomy of dirty data. " Data Mining and Knowledge Discovery 7. 1 (2003): 81-99.
  4. Hipp, Jochen, Ulrich Güntzer, and Udo Grimmer. "Data Quality Mining-Making a Virute of Necessity. " DMKD. 2001.
  5. Jeong, Chang-Hoo; Choi, Sung-Pil; Shin, Sung-Ho; Lee, Seungwoo; Jung, Hanmin; Kim, Soon-Young; Kim, Pyung, "Creating Semantic Data from Relational Database," Social Computing (SocialCom), 2013 International Conference on , vol. , no. , pp. 1081,1086, 8-14 Sept. 2013
  6. Tahat, Said, and Kamsuriah Ahmad. "Semi-Automated Schema Integration (Icase): A Tool To Identify And Resolve Naming Conflicts. " Australian Journal of Basic & Applied Sciences 7. 7 (2013).
  7. Chavan, Anupama A. , and Vijay Kumar Verma. "Functional Dependency Mining form Relational Database: A Survey. " International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249-8958.
  8. Extracting Data Quality Rules Using Information Theoretic Measures, International Review on Computers & Software . Jun2013, Vol. 8 Issue 6, p1321-1327. 7p. Author(s): Amshakal a, K. ; Nedunchezhian, R.
  9. Yun, Unil, Gangin Lee, and Sung-Jin Kim. "Analyzing Efficient Algorithms of Frequent Pattern Mining. " IT Convergence and Security 2012. Springer Netherlands, 2013. 937-945.
  10. Han, Jiawei, and Micheline Kamber. "Data mining: Concepts and techniques. "China Machine Press 8 (2001): 3-6.
  11. www. kanchiuniv. ac. in/dm/dataset11
  12. Peled, Olga, et al. "Entity Matching in Online Social Networks. " Social Computing (SocialCom), 2013 International Conference on. IEEE, 2013.
Index Terms

Computer Science
Information Sciences

Keywords

Data Quality Mining Semantic Integrity Constraints Outlier Detection