CFP last date
20 May 2024
Reseach Article

A Doubleton Pattern Mining Approach for Discovering Colossal Patterns from Biological Dataset

by K.prasanna, M.seetha
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 119 - Number 21
Year of Publication: 2015
Authors: K.prasanna, M.seetha
10.5120/21364-4386

K.prasanna, M.seetha . A Doubleton Pattern Mining Approach for Discovering Colossal Patterns from Biological Dataset. International Journal of Computer Applications. 119, 21 ( June 2015), 41-47. DOI=10.5120/21364-4386

@article{ 10.5120/21364-4386,
author = { K.prasanna, M.seetha },
title = { A Doubleton Pattern Mining Approach for Discovering Colossal Patterns from Biological Dataset },
journal = { International Journal of Computer Applications },
issue_date = { June 2015 },
volume = { 119 },
number = { 21 },
month = { June },
year = { 2015 },
issn = { 0975-8887 },
pages = { 41-47 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume119/number21/21364-4386/ },
doi = { 10.5120/21364-4386 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:04:42.594785+05:30
%A K.prasanna
%A M.seetha
%T A Doubleton Pattern Mining Approach for Discovering Colossal Patterns from Biological Dataset
%J International Journal of Computer Applications
%@ 0975-8887
%V 119
%N 21
%P 41-47
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The running time of existing algorithms in Frequent Pattern Mining (FPM) increases exponentially with increasing average data size. The existing algorithms on high dimensional datasets create large number of frequent patterns of small and mid sizes which are ineffective for decision making and shows deficiency on mining process. To discover large patterns or Colossal Patterns Doubleton Pattern Mining (DPM) is considered as very constructive for analyzing these datasets. In this paper, DPM, An integrated approach for discovering Colossal Pattern from Biological datasets is discussed. DPM effectively discovers a set of Colossal Patterns using vertical top-down column intersection operator. DPM makes use of a data structure called 'D-struct', as combination of a doubleton data matrix and one dimensional array pair set to dynamically discover Colossal Patterns from Biological datasets. D-struct has a diverse feature to facilitate is, it has extremely limited and accurately predictable main memory and runs very quickly in memory based constraints. The algorithm is designed in such a way that it enumerates D-struct matrix iteratively and constructs a phylogenetic tree to discover colossal patterns and takes only one scan over the database. The empirical analysis on DPM shows that, the proposed approach attains a better mining efficiency on various Biological datasets and outperforms Colossal Pattern Miner (CPM) in different settings.

References
  1. R. Agrawal and R. Srikant. "Fast algorithms for mining association rules" Proceedings of Internatinonal Conference on VLDB, pp 487–499 in 1994.
  2. H. Mannila, H. Toivonen, and A. I. "Verkamo. Efficient algorithms for discovering association rules". Proceedings of Internatinonal Conference on KDD in 1994.
  3. H. Manila,H. Toivonen, and A. I. verkamo"Discovery of frequent episodes in event sequences" , journal of Data Mining and Knowledge Discovery. pp 259-289. 1997.
  4. R. Srikant and R. Agrawal. "Mining sequential patterns: Generalizations and performance improvements" Proceedings of International Conference on EDB', pages 3–17 in 1996.
  5. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen,U. Dayal, and M. -C. Hsu. "PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth" . Proceedings of International Conference on Data Engineering (ICDE) in 2001,
  6. R. J. Bayardo "Efficiently mining long patterns from databases" . Proceedings of International conference ACM SIGMOD, pages 85–93 in 1998.
  7. J. Pei, J. Han, and R. Mao. "CLOSET: An efficient algorithm for mining frequent closed itemsets" Proceedings of International conference ACM SIGMOD and International Workshop Data Mining and Knowledge Discovery (DMKD'00), pages 11–20 in 2000.
  8. M. Zaki. "Generating non-redundant association rules" . Journal of Knowledge Discovery in Databases pages 34–43 in 2000.
  9. Y. Cheng and G. M. Church. "Biclustering of expression data" . Proceedings of International Conference on Intelligent Systems for Mocular Biology, 2000.
  10. J. Yang, H. Wang, W. Wang, and P. S. Yu. " Enhanced Biclustering on Gene Expression data" . Proceedings of 3rd IEEE International Symposium on Bioinformatics and Bioengineering (BIBE), Washington DC, Mar. 2003
  11. G. Cong, A. K. H. Tung, X. Xu, F. Pan, and J. Yang. "FARMER: Finding interesting rule groups in microarray datasets" . Proceedings of 23rd ACM International Conference on Management of Data, 2004.
  12. C. Creighton and S. Hanash. "Mining gene expression databases for association rules" Journal of Bioinformatics, volume 19, 2003.
  13. Z. Zhang, A. Teo, B. Ooi, and K. -L. Tan. "Mining deterministic biclusters in gene expression data" . In 4th Symposium on Bioinformatics and Bioengineering, 2004.
  14. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. "Discovering frequent closed itemsets for association rules" . Proceedings of International Conference on Database Theory (ICDT), 1999.
  15. M. J. Zaki and C. Hsiao. " CHARM: An efficient algorithm for closed association rule mining" . Procedings of International Conference SIAM on Data Mining (SDM), 2002.
  16. F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. J. Zaki. "CARPENTER: Finding closed patterns in long biological datasets" . Procedings of Internationa Conference ACM SIGKDD and International Conference on Knowledge Discovery and Data Mining (KDD), 2003.
  17. D. Madhavi, M. Shashi. "An Efficient Approach to Colossal Pattern Mining" International Journal of Computer Science and Network Security(IJCSNS) Volume 6 304-312. 2010.
  18. UCI machine learning data sets http://archive. ics. uci. edu/ml/datasets/.
Index Terms

Computer Science
Information Sciences

Keywords

Colossal pattern Doubleton Pattern Mining Gene Association Analysis Biological data Colossal Pattern Miner (CPM).