CFP last date
20 May 2024
Reseach Article

A Dynamic Sliding Window based Balanced Parallel Frequent Itemset Mining Algorithm in Data Stream

by Zakria Mahrousa, Dima Mufti Alchawafa, Hasan Kazzaz
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 175 - Number 16
Year of Publication: 2020
Authors: Zakria Mahrousa, Dima Mufti Alchawafa, Hasan Kazzaz
10.5120/ijca2020920670

Zakria Mahrousa, Dima Mufti Alchawafa, Hasan Kazzaz . A Dynamic Sliding Window based Balanced Parallel Frequent Itemset Mining Algorithm in Data Stream. International Journal of Computer Applications. 175, 16 ( Sep 2020), 48-55. DOI=10.5120/ijca2020920670

@article{ 10.5120/ijca2020920670,
author = { Zakria Mahrousa, Dima Mufti Alchawafa, Hasan Kazzaz },
title = { A Dynamic Sliding Window based Balanced Parallel Frequent Itemset Mining Algorithm in Data Stream },
journal = { International Journal of Computer Applications },
issue_date = { Sep 2020 },
volume = { 175 },
number = { 16 },
month = { Sep },
year = { 2020 },
issn = { 0975-8887 },
pages = { 48-55 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume175/number16/31540-2020920670/ },
doi = { 10.5120/ijca2020920670 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:25:15.062437+05:30
%A Zakria Mahrousa
%A Dima Mufti Alchawafa
%A Hasan Kazzaz
%T A Dynamic Sliding Window based Balanced Parallel Frequent Itemset Mining Algorithm in Data Stream
%J International Journal of Computer Applications
%@ 0975-8887
%V 175
%N 16
%P 48-55
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Frequent itemset mining algorithms are one of the most interesting research issues in recent years. They play an important role in finding association rules from a continuous massive data stream such as: customer behavior tracking, retail sales, network monitoring, etc. In this paper, a novel approach will be introduced to remove some drawbacks in parallel FP-Growth and enable it to handle the data stream. The proposed algorithm DSW-BPGFP (Dynamic Sliding Window - Balanced Parallel Graph Frequent Pattern) will improve space and time required based on a compact data structure, called FP-Graph to maintain and store dynamic sliding window transactions. The algorithm dynamically reconstructs and compresses directed graph data structure to control the amount of space usage, and the size of dynamic window will be adjusted by the concept change detection. Moreover, DSW-BPGFP will distribute loads between Hadoop cluster nodes equally, by introducing load balancing strategy. The experiments show that the proposed algorithm can achieve a good speedup, a good degree of balance between nodes and efficiently process large dynamic datasets. In addition, it achieves improvement in memory consumption to store frequent patterns and in time complexity.

References
  1. Bustio-Martínez, L. , Muñoz-Briseño, A. , Cumplido, R., Hernández-León, R. and Feregrino-Uribe, C. 2019. A Novel Multi-Core Algorithm for Frequent Itemsets Mining in Data Streams.
  2. Srinivas, A.V. 2016. An Overview of Algorithms Used for Mining Frequent Patterns in Data Streams.
  3. Fu, X. , Shi, L., Li, J. 2017. Balanced Parallel Frequent Pattern Mining Over Massive Data Stream
  4. McArdle, C. and Wang, X. 2013. Frequent Itemset Mining Over Stream Data: Overview.
  5. Peddireddy, B., Ch, A. and Patnala, S. R. C. M. 2018. A Survey on Mining Frequent Item Sets from Data Stream.
  6. Chandra, B. and Bhaskar, S. 2013. A Novel Approach for Finding Frequent Itemsets in Data Stream.
  7. Li, H. and Wang L. 2017. A Variable Size Sliding Window Based Frequent Itemsets Mining Algorithm in Data Stream.
  8. Deypir, M., Sadreddini H. M. and Hashemi, S. 2012. Towards A Variable Size Sliding Window Model for Frequent Itemset Mining Over Data Streams.
  9. Koh, J. L. and Lin, C.Y. 2009. Concept Shift Detection for Frequent Itemsets From Sliding Window Over Data Streams.
  10. Agrawal, R. and Srikant, R. 1994. Fast Algorithms for Mining Association Rules.
  11. Han, J., Pei, J. and Yin, Y. 2000. Mining Frequent Patterns Without Candidate Generation.
  12. Zaki, M.J. 2000. Scalable Algorithms for Association Mining.
  13. Li, H., Wang, Y., Zhang D., Zhang M., and Chang., E. Y. 2008. PFP: Parallel FP-Growth for Query Recommendation.
  14. Zhou, L., Zhong, Z., Chang, J., Li, J., Huang, J.Z., and Feng, S. 2010. Balanced Parallel FP-Growth with MapReduce.
  15. Nikam, P. V. and Deshpande, D.S. 2018. New Approach in Big Data Mining for Frequent Itemset Using Mapreduce in HDFS.
  16. Leung, C.K.-S. and Khan, Q.I. 2006, DSTree: a tree structure for the mining of frequent sets from data streams.
  17. Tanbeer, S. K., Ahmed, C. F., Jeong, B. and Lee, Y. 2009. Sliding Window-Based Frequent Pattern Mining Over Data Streams.
  18. Kusumakumari, V., Sherigar, D., Chandran, R. and Patil, N. 2017. Frequent Pattern Mining on Stream Data Using Hadoop Cantree-Gtree .
  19. Bustio-Martı'nez, L., Cumplido, R., Hernandez-Leo'n, R., Bande-Serrano, J. M. and Feregrino-Uribe, C.2017. On the Design of Hardware-Software Architectures for Frequent Itemsets Mining on Data Streams.
  20. HE, Y. and YUE, M. 2014. Parallel Frequent Itemset Mining on Streaming Data.
  21. Dean, J. and Ghemawat, S. 2008 MapReduce: simplified data processing on large clusters.
  22. FIMI 2004 Repository [online], http://fimi.ua.ac.be/data/.
Index Terms

Computer Science
Information Sciences

Keywords

Association Rule Mining (ARM) Data Stream Directed Graph Dynamic sliding window FP-Growth Frequent itemset Hadoop Load balance MapReduce