Call for Paper - November 2022 Edition
IJCA solicits original research papers for the November 2022 Edition. Last date of manuscript submission is October 20, 2022. Read More

Mining Frequent Patterns of Crime using FP-Growth with Multiple Minimum Supports based on Shannon Entropy

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Authors:
George Matto, Joseph Mwangoka
10.5120/ijca2018916577

George Matto and Joseph Mwangoka. Mining Frequent Patterns of Crime using FP-Growth with Multiple Minimum Supports based on Shannon Entropy. International Journal of Computer Applications 180(24):45-52, March 2018. BibTeX

@article{10.5120/ijca2018916577,
	author = {George Matto and Joseph Mwangoka},
	title = {Mining Frequent Patterns of Crime using FP-Growth with Multiple Minimum Supports based on Shannon Entropy},
	journal = {International Journal of Computer Applications},
	issue_date = {March 2018},
	volume = {180},
	number = {24},
	month = {Mar},
	year = {2018},
	issn = {0975-8887},
	pages = {45-52},
	numpages = {8},
	url = {http://www.ijcaonline.org/archives/volume180/number24/29108-2018916577},
	doi = {10.5120/ijca2018916577},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

FP-Growth is one of the most effective and widely used association rules mining algorithm for discovering interesting relations between items in large datasets. Unfortunately, classical FP-Growth mines frequent patterns by using single user-defined minimum support threshold. This is not adequate for real life applications such as crime patterns mining. On one side, if minimum support is set too low, huge amount of crime patterns (including uninteresting patterns) may be generated, and on the other side, if it is set too high lots of interesting patterns (including seasonal patterns) may be lost. This paper proposes the use of Multiple Item Support (MIS) thresholds instead of single minimum support to tackle the challenge. We employ Shannon entropy method to develop an algorithm that obtains MIS values from crime datasets. The proposed approach is tested on different sizes of input data via a developed working prototype. Experimental results show that our suggested approach outperforms classical FP-Growth in terms of running time and memory use.

References

  1. Han J., Jian P., and Yiwen Y. 2000. “Mining frequent patterns without candidate generation”, ACM SIGMOD Record. Vol. 29. No. 2. ACM.
  2. Isafiade, O., Bagula, A. Berman, S. 2015. A Revised Frequent Pattern Model for Crime Situation Recognition Based on Floor-Ceil Quartile Function. Information Technology and Quatitative Management (ITQM2015). Procedia Computer Science, 55, 251 – 260.
  3. Wang, T., Rudin, C., Wagner, D. and Sevieri, R. 2013. Learning to Detect Patterns of Crime. Springer, Berlin, Heidelberg, 515-530.
  4. Kumbhare, T. A. and Chobe, S. V. 2014. An Overview of Association Rule Mining Algorithms. International Journal of Computer Science and Information Technologies, 5 (1), 927-930.
  5. Isafiade, O., Bagula, A. 2013. Citisafe: Adaptive spatial pattern knowledge using FP-Growth algorithm for crime situation recognition, in: Proc. IEEE International Conference on Ubiquitous Intelligence and Computing, IEEE, pp. 551-556.
  6. Pereira, B. L. and Brandão, W. C. 2014. ARCA: Mining Crime Patterns Using Association Rules. 11th International Conference Applied computing, ISBN: 978-989-8533-25-8.
  7. Hu, Li‑Yu, Hu, Ya‑Han, Tsai, Chih‑Fong, Wang, Jian‑Shian and Huang, Min‑Wei. 2016. Building an associative classifier with multiple minimum supports. SpringerPlus. DOI 10.1186/s40064-016-2153-1.
  8. Fournier-Viger, P., Wu, C. W., Tseng, V. S. 2012. “Mining Top-K Association Rules”. Proceedings of the 25th Canadian Conf. on Artificial Intelligence (AI 2012), Springer, LNAI 7310, pp. 61-73.
  9. Kiran R. Uday. 2011. “Generating Huge Number of Candidate Patterns and Multiple Scans on the Dataset” PhD Thesis. Center for Data Engineering International Institute of Information Technology Hyderabad, 500 032, India.
  10. Zeng, Y., Yin, S. Liu, J. and Zhang, M. 2015. Research of Improved FP-Growth Algorithm in Association Rules Mining. Hindawi Publishing Corporation. Scientific Programming, Article ID 910281, 6 pages.
  11. Han, J., Jian Pei, and Yiwen Yin, Runying Mao. 2004. “Mining frequent patterns without candidate generation: A Freequent Pattern Tree Approach”, Data Mining and Knowledge Discovery, 8, 53–87.
  12. Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile.
  13. Tohidi, H. and Ibrahim, H. 2010. A Frequent Pattern Mining Algorithm Based on FP-growth without Generating Tree. Proceedings of the Knowledge Management International Conference 2010, Kuala Terengganu (Malaysia), 25-27 May 2010, pages 671-676.
  14. Xia, D., Zhou, Y., Rong, Z. and Zhang, Z. 2013. IPFP: An Improved Parallel FP-Growth Algorithm for Frequent Itemsets Mining, in: Proc. 59th ISI World Statistics Congress, Hong Kong (Session CPS026), pp. 4034-4039.
  15. Li, H., Wang, Y., Zhang, D., Zhang, M. and Chang, E. Y. 2008. “PFP: Parallel FP-Growth for query recommendation,” In: Proceeding of the 2008 ACM conference on Recommender systems, Lausanne, Switzerland, 107-114.
  16. Pramudiono, I. and Kitsuregawa, M. 2003. “Parallel FP-Growth on PC Cluster,” Advances in Knowledge Discovery and Data Mining, 2637, 467-473.
  17. Lee, J., Clifton, C. W. 2014. Top-k frequent itemsets via differentially private FP-Tree. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, Pages 931-940.
  18. Lingling Deng, Yuansheng Lou, 2015. “Improvement and Research of FP-Growth Algorithm Based on Distributed Spark”, IEEE, pp. 105-108, doi: 10.1109/CCBD.2015.15.
  19. Itkar, S. A. and Kulkarni, U. V. 2013. Distributed Algorithm for Frequent Pattern Mining using Hadoop MapReduce Framework. In Journal of Association of Computer Electronics and Electrical Engineer, pp. 15-24.
  20. Liu, B., Hsu W. and Ma, Y. 1999. Mining association rules with multiple minimum supports, Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, August 15-18, San Diego, California, United States, p.337-341.
  21. Ya-Han, H. and Yen-Liang, C. 2006. Mining association rules with multiple minimum supports: A new mining algorithm and a support tuning mechanism. Decis. Support Syst., 42: 1-24.
  22. Kiran, P., Uday, R. and Reddy, K. 2009. “An Improved Multiple Minimum Support Based Approach to Mine Rare Association Rules”, IEEE Symposium on Computational Intelligence and Data Mining, pp. 340-347.
  23. Yi-Chun Chen, Grace Lin, Ya-Hui Chan, and Meng-Jung Shih. 2014. “Mining Frequent Patterns with Multiple Item Support Thresholds in Tourism Information Databases”. Springer International Publishing Switzerland, pp. 89-98.
  24. Lesne, A. 2014. Shannon entropy: A rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Math. Struct. Comput. Sci. 24, e240311.
  25. Bhatt, U., & Patel, P. 2015. A Novel Approach for Finding Rare Items Based on Multiple Minimum Support Framework. Procedia Computer Science, 57, 1088-1095.
  26. Matto, G. and Mwangoka, J. 2017. Detecting crime patterns from Swahili newspapers using text mining. International Journal of Knowledge Engineering and Data Mining, Vol. 4, No. 2, pp.145–156.

Keywords

FP-Growth, Crime Pattern, Multiple Minimum Supports, Shannon Entropy.