CFP last date
20 May 2024
Reseach Article

Data Mining Techniques in Parallel Environment- A Comprehensive Survey

by Kinjal Shah, Prashant Chauhan, M. B. Potdar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 108 - Number 1
Year of Publication: 2014
Authors: Kinjal Shah, Prashant Chauhan, M. B. Potdar
10.5120/18879-0151

Kinjal Shah, Prashant Chauhan, M. B. Potdar . Data Mining Techniques in Parallel Environment- A Comprehensive Survey. International Journal of Computer Applications. 108, 1 ( December 2014), 36-41. DOI=10.5120/18879-0151

@article{ 10.5120/18879-0151,
author = { Kinjal Shah, Prashant Chauhan, M. B. Potdar },
title = { Data Mining Techniques in Parallel Environment- A Comprehensive Survey },
journal = { International Journal of Computer Applications },
issue_date = { December 2014 },
volume = { 108 },
number = { 1 },
month = { December },
year = { 2014 },
issn = { 0975-8887 },
pages = { 36-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume108/number1/18879-0151/ },
doi = { 10.5120/18879-0151 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:41:54.510946+05:30
%A Kinjal Shah
%A Prashant Chauhan
%A M. B. Potdar
%T Data Mining Techniques in Parallel Environment- A Comprehensive Survey
%J International Journal of Computer Applications
%@ 0975-8887
%V 108
%N 1
%P 36-41
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data mining is the process of discovering interesting and useful patterns and relationships in large volumes of data. The valuable knowledge can be discovered through the process of data mining for the further use and prediction. We have different data mining techniques like clustering classification and association. Classification is one of the major techniques to discover the patterns in huge amount of data. This technique is widely used in many fields. We have a large volume of data and if we extract the data sequentially then it will take a lot of timing. So if we extract the data parallely, the amount of time taken can be reduced. We can use parallel techniques when there is a large volume of data and we want to extract the data in very few seconds. We can implement this techniques using different approaches like MPI, OPENMP, using CUDA or using Map Reduce approach. Here in this paper we will discuss data mining techniques classification by decision tree induction and k- nearest neighbors using both sequential approach as well as parallel approach.

References
  1. Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", 2nd edition.
  2. Kesavaraj, G. ; Sukumaran, S. , "A study on classification techniques in data mining," Computing, Communications and Networking Technologies (ICC-CNT),2013 Fourth International Conference on , vol. , no. , pp. 1,7, 4-6 July 2013
  3. Kotecha, R. ; Ukani, V. ; Garg, S. , "An empirical analysis of multiclass classification techniques in data mining," Engineering (NUiCONE), 2011 Nirma University International Conference on , vol. , no. , pp. 1,5, 8-10 Dec. 2011 doi: 10. 1109/NUiConE. 2011. 6153244
  4. Shenshen Liang; Ying Liu; ChengWang; Liheng Jian, "Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU," Web Society (SWS), 2010 IEEE 2nd Symposium on , vol. , no. , pp. 53,60, 16-17 Aug. 2010 doi: 10. 1109/SWS. 2010. 5607480
  5. Shraddha Masih and Sanjay Tanwani, "Data Mining Techniques in Parallel and Distributed Environment- A Comprehensive Survey", IJETAE, Volume 4, Issue 3, March 2014.
  6. Wang, Lizhe, et al. "G-Hadoop: MapReduce across distributed data centers for data-intensive computing. " Future Generation Computer Systems 29. 3 (2013): 739-750
  7. Nickolls, John, et al. "Scalable parallel programming with CUDA. " Queue 6. 2 (2008): 40-53.
  8. K. Bhaduri, R. Wolf, C. Giannella, and H. Kargupta. "Distributed decision-tree induction in peer-to-peer systems. " Stat. Anal. Data Min. , 1(2):85–103, 2008.
  9. Yike Guo and R. Grossman, "HIGH PERFORMANCE DATA MINING Scaling Algorithms, Applications and Systems", A Special Issue of DATA MINING AND KNOWLEDGE DISCOVERY, Volume 3, No. 03(1999).
  10. Jhummerwala Abdul, M. B. Potdar, Prashant Chauhan, "Parallel and Distributed GIS for processing Geo-data: An Overview", International Journal of Computer Applications Volume 106, issue 16
Index Terms

Computer Science
Information Sciences

Keywords

Decision tress KNN MPI CUDA KDD OPEN MP