An Improved Algorithm for Efficient Mining of Frequent Item Sets on Large Uncertain Databases

Bala Yesu Chilakalapudi; Narayana Satyala; Satyanarayana Menda

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

ReLeaf: A MobileNetV2-Based Mobile Application for Real-Time Waste Classification with LLM-Assisted Recycling Guidance

Fatimah H. Alyami Nadeen N. Abduljabbar Ghadi T. Alzahrani Dana B. Alakeel Amal S. Almirsal Atheer S. Algherairy

Random Articles

A Proposed Compiler to Integrate Model Driven Architecture with Web Services ñ Road Map

February

2011

Android based Traffic Rules Violation Detection System “Vehitrack”

Apr

2017

A Comparison of Joint ICA and Parallel ICA Multimodal Fusion Methods in Schizophrenia

June

2014

Optimistic Algorithmic Approaches for Traffic Engineering Policies of Congestion and Traffic Distribution in the Networks

Sep

2017

Reseach Article

An Improved Algorithm for Efficient Mining of Frequent Item Sets on Large Uncertain Databases

by Bala Yesu Chilakalapudi, Narayana Satyala, Satyanarayana Menda

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 73 - Number 12

Year of Publication: 2013

Authors: Bala Yesu Chilakalapudi, Narayana Satyala, Satyanarayana Menda

10.5120/12791-9932

Bala Yesu Chilakalapudi, Narayana Satyala, Satyanarayana Menda . An Improved Algorithm for Efficient Mining of Frequent Item Sets on Large Uncertain Databases. International Journal of Computer Applications. 73, 12 ( July 2013), 8-15. DOI=10.5120/12791-9932

@article{ 10.5120/12791-9932,

author = { Bala Yesu Chilakalapudi, Narayana Satyala, Satyanarayana Menda },

title = { An Improved Algorithm for Efficient Mining of Frequent Item Sets on Large Uncertain Databases },

journal = { International Journal of Computer Applications },

issue_date = { July 2013 },

volume = { 73 },

number = { 12 },

month = { July },

year = { 2013 },

issn = { 0975-8887 },

pages = { 8-15 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume73/number12/12791-9932/ },

doi = { 10.5120/12791-9932 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:39:53.057347+05:30

%A Bala Yesu Chilakalapudi

%A Narayana Satyala

%A Satyanarayana Menda

%T An Improved Algorithm for Efficient Mining of Frequent Item Sets on Large Uncertain Databases

%J International Journal of Computer Applications

%@ 0975-8887

%V 73

%N 12

%P 8-15

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The data handled in emerging applications like location based services, sensor monitoring systems, and data integration, are often inexact in nature. In this paper, the important problem of extracting frequent item sets from a large uncertain database, interpreted under the Possible World Semantics (PWS) is presented. This issue is technically challenging, since an uncertain database contains an exponential number of possible worlds. By observing that the mining process can be modeled as a Poisson binomial distribution, an algorithm was developed, which can efficiently and accurately discover frequent item sets in a large uncertain database. The important issue of maintaining the mining result for a database that is evolving (e. g. , by inserting a tuple) can be presented. Specifically, the proposed mining algorithm can enable Probabilistic Frequent Item set (PFI) results to be refreshed. This reduces the need of re-executing the whole mining algorithm on the new database, which is often more expensive and unnecessary. The proposed algorithm can support incremental mining and provides the accurate results on mining the uncertain database. The extensive evaluation on real data set to validate the approach is performed.

References

A. Veloso, W. Meira Jr. , M. de Carvalho, B. Poˆ ssas, S. Parthasarathy, and M. J. Zaki, "Mining Frequent Itemsets in Evolving Databases," Proc. Second SIAM Int'l Conf. Data Mining (SDM), 2002.
C. Aggarwal, Y. Li, J. Wang, and J. Wang, "Frequent Pattern Mining with Uncertain Data, ", Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2009.
C. Aggarwal and P. Yu, "A Survey of Uncertain Data Algorithm and Applications," IEEE Trans Knowledge and Data Eng. , vol. 21, no. 5, pp. 609-623, May 2009.
R. Agrawal, T. Imieli?nski, and A. Swami, "Mining Association Rules between Sets of Items in Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, 1993. [5 ] O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom, "ULDBs: Databases with Uncertainty and Lineage," Proc. 32ndInt'l Conf. Very Large Data Bases (VLDB), 2006.
T. Bernecker, H. Kriegel, M. Renz, F. Verhein, and A. Zuefle,"Probabilistic Frequent Itemset Mining in Uncertain Databases," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2009.
C. J. van Rijsbergen, Information Retrieval. Butterworth, 1979.
L. L. Cam, "An Approximation Theorem for the Poisson Binomial Distribution," Pacific J. Math. , vol. 10, pp. 1181- 1197,1960.
H. Cheng, P. Yu, and J. Han, " Frequent Itemset Mining in the Presence of Random Noise," Proc. Soft Computing for Knowledge Discovery and Data Mining, pp. 363-389, 2008.
R. Cheng, D. Kalashnikov, and S. Prabhakar, "Evaluating Probabilistic Queries over Imprecise Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
D. Cheung, J. Han, V. Ng, and C. Wong, "Maintenance of Discovered Association Rules in Large Databases: An Updating Technique," Proc. 12th Int'l Conf. Data Eng. (ICDE), 1996.
D. Cheung, S. D. Lee, and B. Kao, "A General Technique for Maintaining Discovered Association Rules," Proc. fifth Int'l Conf. Database Systems for Advanced Applications (DASFAA), 1997.
W. Cheung and O. R. Za?¨ane, " Mining of Frequent Patterns without Candidate Generation or Support Constraint," Proc. Seventh Int'l Database Eng. and Applications Symp. (IDEAS), 2003.
C. K. Chui, B. Kao, and E. Hung, "Mining Frequent Itemsets from Uncertain Data," Proc. 11th Pacific-Asia Conf. advancesin Knowledge Discovery and Data Mining (PAKDD), 2007.
G. Cormode and M. Garofalakis, "Sketching Probabilistic Data Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
N. Dalvi and D. Suciu, "Efficient Query Evaluation on Probabilistic Databases," Proc. 13th Int'l Conf. Very Large Data Bases (VLDB), 2004.
A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong, "Model-Driven Data Acquisition in Sensor Networks," Proc. 13th Int'l Conf. Very Large Data Bases (VLDB), 2004.
J. Han, J. Pei, and Y. Yin,"Mining Frequent Patterns without Candidate Generation," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2000.
J. Huang, "MayBMS: A Probabilistic Database Management System," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data, 2009.
R. Jampani, L. Perez, M. Wu, F. Xu, C. Jermaine, and P. Haas, "MCDB: A Monte Carlo Approach to Managing Uncertain Data," Proc. ACM SIGMOD Int'l Conf. Management of Data 2008.
J. Ren, S. D. Lee, X. Chen, B. Kao, R. Cheng, and D. W. Cheung" Naive Bayes Classification of Uncertain Data," Proc. IEEE Ninth Int'l Conf. Data Mining (ICDM), 2009.
N. Khoussainova, M. Balazinska, and D. Suciu, "Towards Correcting Input Data Errors Probabilistically Using Integrity Constraints," Proc. Fifth ACM Int'l Workshop Data Eng. For Wireless and Mobile Access (MobiDE), 2006.
H. Kriegel and M. Pfeifle, "Density-Based Clustering of Uncertain Data," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining (KDD), 2005.
C. Kuok, A. Fu, and M. Wong, "Mining Fuzzy Association Rules in Databases," SIGMOD Record, vol. 27, no. 1, pp. 41-46, 1998. Proc. IEEE Fifth Int'l Conf. Data Mining (ICDM), 2005
C. K. -S. Leung, Q. I. Khan, and T. Hoque, "Cantree: A Tree Structure for EfficientMining of Frequent Patterns,". Proc. IEEE Fifth Int'l Conf. Data Mining (ICDM), 2005.
A. Lu, Y. Ke, J. Cheng, and W. Ng, "Mining Vague Association Rules," Proc. 12th Int'l Conf. Database Systems for Advanced Applications (DASFAA), 2007.
M. Mutsuzaki, "Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS," Proc. Third Biennial Conf. Innovative Data Systems Research (CIDR), 2007.
P. Sistla, O. Wolfson, S. Chamberlain, and S. Dao, "Querying the Uncertain Position of Moving Objects," Temporal Databases: Research and Practice, Springer Verlag, 1998.
C. Stein, Computation of Expectations, Lecture Notes Monograph Series, vol. 7, Inst. of Math. Statistics, 1986.
L. Sun, R. Cheng, D. W. Cheung, and J. Cheng, "Mining Uncertain Data with Probabilistic Guarantees," Proc. 16th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2010.
T. Jayram et al. , "Avatar Information Extraction System," IEEE Data Eng. Bull. , vol. 29, no. 1, pp. 40-48, Mar. 2006.
S. Tsang, B. Kao, K. Y. Yip, W. -S. Ho, and S. D. Lee. , "Decision Trees for Uncertain Data," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2009.
L. Wang, R. Cheng, S. D. Lee, and D. Cheung, "Accelerating Probabilistic Frequent Itemset Mining: A Approach," Proc. 19th ACM Int'l Conf. Information and Knowledge Management (CIKM), 2010.
M. Yiu, N. Mamoulis, X. Dai, Y. Tao, and M. Vaitis, "Efficient Evaluation of Probabilistic Advanced Spatial Queries on existentially Uncertain Data," IEEE Trans Knowledge and Data Eng. , vol. 21, no. 9, pp. 108-122, Jan. 2009.
Q. Zhang, F. Li, and K. Yi, "Finding Frequent Items in Probabilistic Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.

Index Terms

Computer Science

Information Sciences

Keywords

PFI PWS S-PMF CDF