CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

RecB: Set Theory based Technique for Large Scale Pattern Mining in Web Logs

by Tanya Steen, Ray Lindsay
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 124 - Number 8
Year of Publication: 2015
Authors: Tanya Steen, Ray Lindsay
10.5120/ijca2015905584

Tanya Steen, Ray Lindsay . RecB: Set Theory based Technique for Large Scale Pattern Mining in Web Logs. International Journal of Computer Applications. 124, 8 ( August 2015), 1-9. DOI=10.5120/ijca2015905584

@article{ 10.5120/ijca2015905584,
author = { Tanya Steen, Ray Lindsay },
title = { RecB: Set Theory based Technique for Large Scale Pattern Mining in Web Logs },
journal = { International Journal of Computer Applications },
issue_date = { August 2015 },
volume = { 124 },
number = { 8 },
month = { August },
year = { 2015 },
issn = { 0975-8887 },
pages = { 1-9 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume124/number8/22128-2015905584/ },
doi = { 10.5120/ijca2015905584 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:13:49.194748+05:30
%A Tanya Steen
%A Ray Lindsay
%T RecB: Set Theory based Technique for Large Scale Pattern Mining in Web Logs
%J International Journal of Computer Applications
%@ 0975-8887
%V 124
%N 8
%P 1-9
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Web Analytics is a way of turning raw data into actionable information. Large organisations own web based applications and connect to external databases which generate very large web logfiles. It then becomes crucial to estimate how information systems are accessed by staff, what their search preferences are, what documents are of greater demand. One challenge in obtaining this knowledge is that logfiles contain unstructured information where authentic search requests are not discriminated from crawler hits. Another challenge is that many proposed pattern mining techniques are usually tested on small benchmark datasets, so their performance on a large scale is hard to predict. This paper stresses the importance of data preprocessing and introduces an efficient method for mining patterns in large sized collections of web logs (of all types) based on classic set theory properties.

References
  1. F.Giroire, J.Chandrashekar, G.Iannaccone, K.Papagiannaki, E.Schooler & N.Taft, The Cubicle Vs. The Coffee Shop: Behavioral Modes in Enterprise End-Users in: Passive and Active Network Measurement, LNCS 4979 (2008) p. 202.
  2. K.T.Kishore, S.T.Vardhan & L.N.Narayana, Probabilistic Semantic Web Mining Using Artificial Neural Analysis, International Journal of Computer Science and Information Security (IJCSIS) 7 (3) 2010.
  3. R.Cooley, B.Mobasher & J.Srivastava, (1999), Data Preparation for Mining World Wide Web Browsing Patterns, Knowledge and Information Systems 1 (1) 1999.
  4. R.Shettar, Sequential Pattern Mining from Web Log Data, International Journal of Engineering Science & Advanced Technology (IJESAT), 2 (2) 2012.
  5. V.Ciesielski & A.Lalani, Data Mining ofWeb Access Logs From an Academic Web Site, in: Proceedings of the Third International Conference on Hybrid Intelligent Systems HIS’03: Design and Application of Hybrid Intelligent Systems, IOS Press, 2003.
  6. Q.Yang, Ch.Ling & J.Gao, Mining Web Logs for Actionable Knowledge, in: Intelligent Technologies for Information Analysis, Springer-Verlag, 2004.
  7. R.Iv´ancsy, I.Vajk, Frequent Pattern Mining in Web Log Data, Acta Polytechnica Hungarica, 3 (1) 2006.
  8. L.K.J.Grace, V.Maheswari, D.Nagamalai, Analysis of Web Logs and Web User in Web Mining, International Journal of Network Security & Its Applications (IJNSA) 3 (1) 2011.
  9. O.Bell, M.Allman & B.Kuperman, On Browser-Level Event Logging, TR-12-001, ICSI, 2012.
  10. T.Callahan, M.Allman & V.Paxson, A Longitudinal View of HTTP Traffic, in: Passive and Active Measurement, LNCS 6032, 2010.
  11. J.Pei, J.Han, B.Mortazavi-asl & H.Zhu, Mining Access Patterns Efficiently from Web Logs, in: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications (PADKK’00), Springer-Verlag London, UK, 2000.
  12. L.Sun, X.Zhang, Efficient Frequent Pattern Mining on Web Log Data, University of Melbourne, Australia, 2004.
  13. L.Liu & J.Liu, Mining Web Log Sequential Patterns with Layer Coded Breadth-First Linked WAP-Tree, in: Proceedings of the IEEE International Conference on Information Science and Management Engineering, 2010.
  14. J.D.Parmar & S.Garg, Modified web access pattern (mWAP) approach for sequential pattern mining, International Journal of Network Security & Its Applications (IJNSA) 3 (1) 2011.
  15. K.C.Srikantaiah, K.Krishna, N.K.R.Venugopal, L.M.Patnaik, Bidirectional Growth Based Mining and Cyclic Behaviour Analysis of Web Sequential Patterns, International Journal of Data Mining & Knowledge Management Process (IJDKP), 03 (2) 2013.
  16. J.Han & M.Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.
  17. P.Erd¨os, C.Ko, R.Rado, Intersection Theorems for Systems of Finite Sets, Journal of Mathematics 2 (12) (Oxford, 1961) p. 313.
  18. A.Robertson, Permutations Containing and Avoiding 123 and 132 patterns, Discrete Mathematics and Theoretical Computer Sciences, 3 (1999) p. 151.
  19. A.Rajimol, G.Raju, FOL-Mine —A More Efficient Method for Mining Web Access Pattern, Advances in Computing and Communications Communications in Computer and Information Science, 191 (2011) p. 253. F.Schulz, Trees with exponentially growing costs, Information and Computation, 206 (2008) p. 569. Sh.Cong, A Sampling-based Framework for Parallel Mining Frequent Patterns, PhD Thesis, University of Illinois at Urbana- Champaign, 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Set theory pattern mining web mining computational complexity complexity reduction big data analytics