CFP last date
20 May 2024
Reseach Article

Database Transformation to Build Dataset for Generation of Decision Tree and Extended ER Model

by Archana A. Chaudhari, Harmeet Kaur Khanuja
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 118 - Number 12
Year of Publication: 2015
Authors: Archana A. Chaudhari, Harmeet Kaur Khanuja
10.5120/20800-3482

Archana A. Chaudhari, Harmeet Kaur Khanuja . Database Transformation to Build Dataset for Generation of Decision Tree and Extended ER Model. International Journal of Computer Applications. 118, 12 ( May 2015), 41-45. DOI=10.5120/20800-3482

@article{ 10.5120/20800-3482,
author = { Archana A. Chaudhari, Harmeet Kaur Khanuja },
title = { Database Transformation to Build Dataset for Generation of Decision Tree and Extended ER Model },
journal = { International Journal of Computer Applications },
issue_date = { May 2015 },
volume = { 118 },
number = { 12 },
month = { May },
year = { 2015 },
issn = { 0975-8887 },
pages = { 41-45 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume118/number12/20800-3482/ },
doi = { 10.5120/20800-3482 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:01:32.362339+05:30
%A Archana A. Chaudhari
%A Harmeet Kaur Khanuja
%T Database Transformation to Build Dataset for Generation of Decision Tree and Extended ER Model
%J International Journal of Computer Applications
%@ 0975-8887
%V 118
%N 12
%P 41-45
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In Data mining project most of the time consuming task is to prepare a required data set for data mining analysis because in general the relational database has collection of tables and views that must be joined, aggregated and transformed in order to build the required data set. As result, most of the complex SQL queries are written multiple times independently from each other and in a disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model. Similarly existing SQL aggregations having some limitations to prepare normalized data sets because they return only one column per aggregated group. To address this issue, we propose simple methods to generate SQL code to return aggregated columns in a horizontal tabular layout, where every row corresponds to an observation and every column is associated to a one variable. This new class of functions is called horizontal aggregations. Horizontal aggregations is extension of standard SQL aggregation for building data sets with a horizontal denormalized layout, which is input for most of the data mining algorithms. By providing these standard normalized data-set as an input to the Decision tree generation algorithm for generating Decision tree, similarly we can generate extended ER model.

References
  1. Carlos Ordonez, Sofian Maabout, David Sergio Matusevich, Wellington Cabrera, 2014 "Extending ER models to capture database transformations to build data sets for data mining", in Data and Knowledge Engineering.
  2. Carlos Ordonez and Zhibo Chen, 2012 "Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis". In IEEE Transaction on Knowledge and Data Engineering.
  3. Javier Garca-Garcaa, Carlos Ordonez, 2010 "Extended aggregations for databases with referential integrity issues". In Data and Knowledge Engineering.
  4. Carlos Ordonez, 2004 "Vertical and Horizontal Percentage Aggregations". In Proc. ACM SIGMOD Intl Conf. Management of Data (SIGMOD 04)
  5. Carlos Ordonez, 2006 "Integrating K-Means Clustering with a Relational DBMS Using SQL". In IEEE Trans. Knowledge and Data Eng.
  6. Carlos Ordonez, 2004 "Horizontal Aggregations for Building Tabular Data Sets". In Proc. Ninth ACM SIGMOD Workshop Data Mining and Knowledge Discovery (DMKD 04).
  7. Hall, Mark, Eibe Frank, Geoffrey Holmes, Bernhard P fahringer, Peter Reutemann, and Ian H. Witten, "The WEKA data mining software: an update", ACM SIGKDD explorations newsletter, Vol. 11, no. 1, pp. 10-18, 2009.
  8. Archana A. Chaudhari, H. K. Khanuja, "Database Transformation to Build Data-set for Data Mining Analysis-A Review", Presented in ICCUBEA 2015 Sponsored by IEEE pune section Organized by Pimpri Chinchwad College Of Engineering(PCCOE), Pune .
  9. Archana A. Chaudhari, Harmeet Kaur Khanuja, "Extended SQL Aggregation for Database Transformation", International Journal of Computer Trends and Technology (IJCTT) , Vol 18, No. 6, pp 272-275, Dec 2014.
Index Terms

Computer Science
Information Sciences

Keywords

Data mining Transformation Aggregation Data preparation pivoting SQL.