Call for Paper - November 2020 Edition
IJCA solicits original research papers for the November 2020 Edition. Last date of manuscript submission is October 20, 2020. Read More

Real Time Generalized Log File Management and Analysis using Pattern Matching and Dynamic Clustering

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 91 - Number 16
Year of Publication: 2014
Bhupendra Moharil
Chaitanya Gokhale
Vijayendra Ghadge
Pranav Tambvekar
Sumitra Pundlik
Gaurav Rai

Bhupendra Moharil, Chaitanya Gokhale, Vijayendra Ghadge, Pranav Tambvekar, Sumitra Pundlik and Gaurav Rai. Article: Real Time Generalized Log File Management and Analysis using Pattern Matching and Dynamic Clustering. International Journal of Computer Applications 91(16):1-6, April 2014. Full text available. BibTeX

	author = {Bhupendra Moharil and Chaitanya Gokhale and Vijayendra Ghadge and Pranav Tambvekar and Sumitra Pundlik and Gaurav Rai},
	title = {Article: Real Time Generalized Log File Management and Analysis using Pattern Matching and Dynamic Clustering},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {91},
	number = {16},
	pages = {1-6},
	month = {April},
	note = {Full text available}


The past decade saw an exponential rise in the amount of information available on the World Wide Web. Almost every business organization today uses web based technology to wield its huge client base. Consequently, managing the large data and mining pertinent content has become the need of the hour. This is where the field of big data analytics sows its seeds. The linchpin for this is the process of knowledge discovery. Analyzing server logs and other data footprints, aggregated from clients, can facilitate the building of a concrete knowledge base. Querying the knowledge base can help supplement business and other managerial decisions. The approach herein proposes a real time, generalized alternative to log file management and analysis. It incorporates the development of a sustainable platform which would enable the analysts to understand the essence of the data available.


  • Demiriz A. webspade: a parallel sequence mining algorithm to analyze web log. In Proceedings of the International Conference on Data Mining, pages 755–758. IEEE Proceedings 2002, 2003.
  • Fluentd. Faq. http://docs. fluentd. org/articles.
  • Meichun Hsu. Enabling real-time business intelligence. 6th International Workshop, BIRTC. Published by Springer. ISBN Print: 978-3-642-39871-1, ISBN Online: 978-3-642-39872-8, pages 109–117, 2012.
  • jafsoft. A web server log file sample explained, 2005. http://www. jafsoft. com/searchengines/logsample. html.
  • Jayathilake. Towards structured log analysis. IEEE International Joint Conference on Computer Science and software engineering, pages 259–264, 2012.
  • Rajan S. Patel Jeremy Ginsberg, Matthew H. Mohebbi. Detecting influenza epidemics using search engine data. Nature, 457, 2009.
  • Apache Kafka. homepage. http://kafka. apache. org/.
  • Yelena Yesha Karuna Joshi, Anupam Joshi. On using a warehouse to analyze web logs. Distributed and Parallel Databases, pages 161–180, 2003.
  • G. E. D. S. Jonathan Leibiusky. Getting started with storm. O'Reilly Media.
  • Loggly. How loggly works. http://www. loggly. com/product/how-loggly-works/.
  • Sumo Logic. Meeting the challenge of big data log management: Sumo logic's real-time forensics and push analytics. Sumo Logic white paper.
  • NoSQL. List of nosql databases, 2011. http://nosql-database. org/.
  • Ming-Sheng Zhao Peng Zhu. Session identification algorithm for weg log mining. IEEE International Conference on Management and Service Science(MASS), pages 1–4, 2010.
  • Splunk. Home page. http://www. splunk. com/.
  • Stackexchange. Perspectives on real time data. http://programmers. stackexchange. com/questions.
  • Storm. home page. http://storm-project. net.
  • Tom Whiter. Hadoop : The definitive guide, 3rd edition. O'Reilly Media.
  • Wikipedia. Levenshtein distance. http://en. wikipedia. org/wiki/Levenshteindistance.
  • Chunyue Weng Zhixiang Chen, Fowler R. H. Linear and sublinear time algorithms for mining frequent traversal path patterns from very large web logs. In Proceedings of the Seventh International Conference, IEEE Database Engineering and Applications Symposium, pages 117–122, 2003.