CFP last date
22 April 2024
Reseach Article

Real Time Generalized Log File Management and Analysis using Pattern Matching and Dynamic Clustering

by Bhupendra Moharil, Chaitanya Gokhale, Vijayendra Ghadge, Pranav Tambvekar, Sumitra Pundlik, Gaurav Rai
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 91 - Number 16
Year of Publication: 2014
Authors: Bhupendra Moharil, Chaitanya Gokhale, Vijayendra Ghadge, Pranav Tambvekar, Sumitra Pundlik, Gaurav Rai
10.5120/15962-5320

Bhupendra Moharil, Chaitanya Gokhale, Vijayendra Ghadge, Pranav Tambvekar, Sumitra Pundlik, Gaurav Rai . Real Time Generalized Log File Management and Analysis using Pattern Matching and Dynamic Clustering. International Journal of Computer Applications. 91, 16 ( April 2014), 1-6. DOI=10.5120/15962-5320

@article{ 10.5120/15962-5320,
author = { Bhupendra Moharil, Chaitanya Gokhale, Vijayendra Ghadge, Pranav Tambvekar, Sumitra Pundlik, Gaurav Rai },
title = { Real Time Generalized Log File Management and Analysis using Pattern Matching and Dynamic Clustering },
journal = { International Journal of Computer Applications },
issue_date = { April 2014 },
volume = { 91 },
number = { 16 },
month = { April },
year = { 2014 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume91/number16/15962-5320/ },
doi = { 10.5120/15962-5320 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:13:02.469354+05:30
%A Bhupendra Moharil
%A Chaitanya Gokhale
%A Vijayendra Ghadge
%A Pranav Tambvekar
%A Sumitra Pundlik
%A Gaurav Rai
%T Real Time Generalized Log File Management and Analysis using Pattern Matching and Dynamic Clustering
%J International Journal of Computer Applications
%@ 0975-8887
%V 91
%N 16
%P 1-6
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The past decade saw an exponential rise in the amount of information available on the World Wide Web. Almost every business organization today uses web based technology to wield its huge client base. Consequently, managing the large data and mining pertinent content has become the need of the hour. This is where the field of big data analytics sows its seeds. The linchpin for this is the process of knowledge discovery. Analyzing server logs and other data footprints, aggregated from clients, can facilitate the building of a concrete knowledge base. Querying the knowledge base can help supplement business and other managerial decisions. The approach herein proposes a real time, generalized alternative to log file management and analysis. It incorporates the development of a sustainable platform which would enable the analysts to understand the essence of the data available.

References
  1. Demiriz A. webspade: a parallel sequence mining algorithm to analyze web log. In Proceedings of the International Conference on Data Mining, pages 755–758. IEEE Proceedings 2002, 2003.
  2. Fluentd. Faq. http://docs. fluentd. org/articles.
  3. Meichun Hsu. Enabling real-time business intelligence. 6th International Workshop, BIRTC. Published by Springer. ISBN Print: 978-3-642-39871-1, ISBN Online: 978-3-642-39872-8, pages 109–117, 2012.
  4. jafsoft. A web server log file sample explained, 2005. http://www. jafsoft. com/searchengines/logsample. html.
  5. Jayathilake. Towards structured log analysis. IEEE International Joint Conference on Computer Science and software engineering, pages 259–264, 2012.
  6. Rajan S. Patel Jeremy Ginsberg, Matthew H. Mohebbi. Detecting influenza epidemics using search engine data. Nature, 457, 2009.
  7. Apache Kafka. homepage. http://kafka. apache. org/.
  8. Yelena Yesha Karuna Joshi, Anupam Joshi. On using a warehouse to analyze web logs. Distributed and Parallel Databases, pages 161–180, 2003.
  9. G. E. D. S. Jonathan Leibiusky. Getting started with storm. O'Reilly Media.
  10. Loggly. How loggly works. http://www. loggly. com/product/how-loggly-works/.
  11. Sumo Logic. Meeting the challenge of big data log management: Sumo logic's real-time forensics and push analytics. Sumo Logic white paper.
  12. NoSQL. List of nosql databases, 2011. http://nosql-database. org/.
  13. Ming-Sheng Zhao Peng Zhu. Session identification algorithm for weg log mining. IEEE International Conference on Management and Service Science(MASS), pages 1–4, 2010.
  14. Splunk. Home page. http://www. splunk. com/.
  15. Stackexchange. Perspectives on real time data. http://programmers. stackexchange. com/questions.
  16. Storm. home page. http://storm-project. net.
  17. Tom Whiter. Hadoop : The definitive guide, 3rd edition. O'Reilly Media.
  18. Wikipedia. Levenshtein distance. http://en. wikipedia. org/wiki/Levenshteindistance.
  19. Chunyue Weng Zhixiang Chen, Fowler R. H. Linear and sublinear time algorithms for mining frequent traversal path patterns from very large web logs. In Proceedings of the Seventh International Conference, IEEE Database Engineering and Applications Symposium, pages 117–122, 2003.
Index Terms

Computer Science
Information Sciences

Keywords

Log file Levenshtein distance Real time Pattern matching Dynamic clustering