CFP last date
20 August 2025
Call for Paper
September Edition
IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper
Know more
Random Articles
Reseach Article

SOFIM: Frequent Itemset Mining in Optimized HDFS with Secure De-Duplication

by Bosco Nirmala Priya, Parathasarathi Murugesan, C. Kaleeswari, Achsah Susan Mathew, J. Vimala Roselin, Balakiran S.
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 7
Year of Publication: 2025
Authors: Bosco Nirmala Priya, Parathasarathi Murugesan, C. Kaleeswari, Achsah Susan Mathew, J. Vimala Roselin, Balakiran S.
10.5120/ijca2025924960

Bosco Nirmala Priya, Parathasarathi Murugesan, C. Kaleeswari, Achsah Susan Mathew, J. Vimala Roselin, Balakiran S. . SOFIM: Frequent Itemset Mining in Optimized HDFS with Secure De-Duplication. International Journal of Computer Applications. 187, 7 ( May 2025), 26-35. DOI=10.5120/ijca2025924960

@article{ 10.5120/ijca2025924960,
author = { Bosco Nirmala Priya, Parathasarathi Murugesan, C. Kaleeswari, Achsah Susan Mathew, J. Vimala Roselin, Balakiran S. },
title = { SOFIM: Frequent Itemset Mining in Optimized HDFS with Secure De-Duplication },
journal = { International Journal of Computer Applications },
issue_date = { May 2025 },
volume = { 187 },
number = { 7 },
month = { May },
year = { 2025 },
issn = { 0975-8887 },
pages = { 26-35 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number7/sofim-frequent-itemset-mining-in-optimized-hdfs-with-secure-de-duplication/ },
doi = { 10.5120/ijca2025924960 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-05-29T00:03:16.135091+05:30
%A Bosco Nirmala Priya
%A Parathasarathi Murugesan
%A C. Kaleeswari
%A Achsah Susan Mathew
%A J. Vimala Roselin
%A Balakiran S.
%T SOFIM: Frequent Itemset Mining in Optimized HDFS with Secure De-Duplication
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 7
%P 26-35
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Frequent itemset mining has developed into a critical data mining approach for a variety of study domains. The term "common patterns" refers to those that show often in datasets. Numerous methods for analyzing all common itemsets in the database have been presented. A novel hybrid method is proposed to provide a better result for online applications. Big Data stores a huge volume of data from various industrial applications. The stored information must be retrieved with valuable information from the optimized server. In this paper, the proposed SOFIM (Server Optimized Frequent Itemset Mining) technique finds the positive review-based frequent itemset and improves a storage server's performance. This can be achieved by analyzing the sentiment of a product review. The redundant reviews areavoided by checking duplication. The server performance is optimized by partially replicating the review data in multiple servers. Finally, the combined hybrid model SOFIM provides a better solution for finding frequent item sets.

References
  1. Sivarajah, Uthayasankar, Zahir Irani, and Vishanth Weerakkody, "Evaluating The UseAnd Impact of Web 2.0 Technologies in Local Government," Government Information Quarterly. Elsevier, pp. 473–487, 2015.
  2. Minqing Hu, and Bing Liu, "Mining and Summarizing Customer Reviews," Association for Computing Machinery -ACM, pp. 168-177, 2004.
  3. Haseena, S., Manoruthra, S., Hemalatha, P., & Akshaya, V. (2018). Mining FrequentItemsets on Large Scale Temporal Data. 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). doi:10.1109/iceca.2018.8474890
  4. R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between sets of items in large databases," ACM SIGMOD Rec., vol. 22, no. 2, pp. 207–216, 1993.
  5. Quan, Y., & Zhilong, L. (2020). Efficient Algorithm for Mining Probabilistic Frequent Itemsets of Uncertain Data. 2020 2nd International Conference on Information TechnologyandComputerApplication(ITCA). doi:10.1109/itca52113.2020.00017
  6. Salman,W.A.,&Sadkhan,S.B.(2020). Status and Challenges of Frequent Itemsets and Association Rules MiningMethods. 2020 3rd International Conference on Engineering Technology and Its Applications (IICETA). doi:10.1109/iiceta50496.2020.9318
  7. Silambarasan E, Nickolas S, Mary Saira BhanuS.(2020).CECPABE:ANovel Approach for Secure Data Deduplication in Cloud. International Journal of Advanced Science and Technology, 29(10s), 7958-7971. Retrieved from http://sersc.org/journals/index.php/IJAST/article /view/24241
  8. Yuan, Haoran; Chen, Xiaofeng; Li, Jin; Jiang, Tao; Wang, Jianfeng; Deng, Robert (2019). Secure Cloud Data Deduplication with Efficient Re-encryption. IEEE Transactions on Services Computing, (), 1–1.doi:10.1109/TSC.2019.2948007
  9. S.Wu, C.Du,H.Li, H.Jiang, Z.Shenand B. Mao, "CAGC: A Content-aware Garbage Collection Scheme for Ultra-Low LatencyFlash-based SSDs,"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021, pp. 162-171, doi: 10.1109/IPDPS49936.2021.00025.
  10. Zhang,D.,Le,J.,Mu,N.,Wu,J.,&Liao, X. (2021).Secure and Efficient Data De- duplication in JointCloud Storage. IEEE TransactionsonCloudComputing,1–1.doi:10.1109/tcc.2021.3081702.
  11. Vijayalakshmi, K., & Jayalakshmi, V. (2021). Analysis on data de-duplication techniques of storage of big data in cloud. 2021 5th International Conference on Computing Methodologies and Communication (ICCMC). doi:10.1109/iccmc51019.2021.94184
  12. Sharma, N., Krishna Prasad, A. V., & Kakulapati,V.(2021).File-levelDe-duplication by using text files – Hive integration. 2021 International Conference on Computer Communication and Informatics (ICCCI). doi:10.1109/iccci50826.2021.9402465
  13. Reddy, B. T., Vaishnavi, M., Lalitha, M., Poojitha, P., & Kanthi, V. B. S. (2021).Privacy Preserving Data Deduplication in cloud using Advanced Encryption Standard. 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS). doi:10.1109/icais50930.2021.93957
  14. Kumar, Naresh; Antwal, Shobha; Samarthyam, Ganesh; Jain, S.C (2017).[IEEE 2017 4th International Conference on Signal Processing, Computing and Control (ISPCC) - solan,India(2017.9.21-2017.9.23)]20174th International Conference on Signal Processing, Computing and Control (ISPCC) - Genetic optimized data de-duplication for distributed big datastoragesystems.,(),7–15.doi:10.1109/ISPCC.2017.8269581
  15. Bartus, Paul; Arzuaga, Emmanuel(2018). [IEEE 2018 IEEE InternationalCongress on Big Data (BigData Congress) - San Francisco,CA,USA(2018.7.2-2018.7.7)]2018 IEEE International Congress on Big Data (BigData Congress) - GDedup: Distributed File System Level Deduplication for Genomic Big Data. ,(),120–127.doi:10.1109/BigDataCongress.2018.00023
  16. Zhang, Dongzhan; Liao, Chengfa; Yan, Wenjing; Tao, Ran; Zheng, Wei (2017). [IEEE 2017 Fifth International Conference on AdvancedCloudandBigData(CBD)-Shanghai,China(2017.8.13-2017.8.16)]2017 Fifth International Conference on Advanced CloudandBigData (CBD) -Data Deduplication BasedonHadoop.,(),147–152.doi:10.1109/CBD.2017.33
  17. Xia, Qiufen; Xu, Zichuan; Liang, Weifa; Yu,Shui;Guo,Song;Zomaya,Albert (2019). Efficient Data Placement andReplication for QoS-Aware Approximate Query Evaluation of Big Data Analytics. IEEE Transactions on Parallel and Distributed Systems, (),1–1.doi:10.1109/TPDS.2019.2921337
  18. A. Beloglazov, J. Abawajy, and R. Buyya. Energy-aware resource allocation heuristics for efficient management of datacenters for cloud computing. J. of Future Generation Computer Systems, Vol. 28, No. 5, pp.755-768, 2012.
  19. H. Hou, J. Yu, and R. Hao, "Cloud storage auditingwithde-duplicationsupportingdifferent security levels according to data popularity," J. Netw. Comput. Appl., vol. 134, pp. 26–39, 2019, doi: 10.1016/j.jnca.2019.02.015.
  20. R. Kaur, I. Chana, and J. Bhattacharya, "Data de-duplication techniques for efficient cloud storage management: a systematicreview," J. Supercomput., vol. 74, no. 5, pp. 2035–2085,2018,doi:10.1007/s11227-017-2210-8.
Index Terms

Computer Science
Information Sciences

Keywords

Frequent itemset mining bigdata SOFIM De-duplication replication