Call for Paper - November 2022 Edition
IJCA solicits original research papers for the November 2022 Edition. Last date of manuscript submission is October 20, 2022. Read More

Real Time Event Detection Adopting Incremental TF-IDF based LSH and Event Summary Generation

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Jeyakumar Kannan, Ar Md Shanavas, Sridhar Swaminathan

Jeyakumar Kannan, Ar Md Shanavas and Sridhar Swaminathan. Real Time Event Detection Adopting Incremental TF-IDF based LSH and Event Summary Generation. International Journal of Computer Applications 180(13):22-30, January 2018. BibTeX

	author = {Jeyakumar Kannan and Ar Md Shanavas and Sridhar Swaminathan},
	title = {Real Time Event Detection Adopting Incremental TF-IDF based LSH and Event Summary Generation},
	journal = {International Journal of Computer Applications},
	issue_date = {January 2018},
	volume = {180},
	number = {13},
	month = {Jan},
	year = {2018},
	issn = {0975-8887},
	pages = {22-30},
	numpages = {9},
	url = {},
	doi = {10.5120/ijca2018916252},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Recently, twitter users are leveraged to detect social and physical events such as festivals and traffic jam at real time. Real time event detection and summarization from Cricket sports is the process of detecting events such as boundary at real time from live Cricket tweet stream as soon as event happens and generating a quick game summary. This is an interesting, yet a complex problem. Because of the need for rapid detection of sports events and for the generation of a concise summary from huge volume of tweets for Cricket enthusiasts. In this paper, a novel framework is proposed for detecting key events from live Cricket tweets and for generating a game summary using the crawled tweets. Feature vectors of live tweets are created using incremental TF-IDF representation and tweet clusters are discovered using Locality Sensitive Hashing (LSH) where the post rate of each cluster determines the key event. A key event is recognized from that cluster using our domain specific event lexicon. Then, important moments from the crawled tweets are computed by identifying the spikes in the tweets volume. Top-k tweets from each moment are selected by ranking tweets on top-k words. Representative tweets from top-k tweets are identified using Jaccard similarity. The evaluation on 2017 IPL T20 Cricket live tweets using ROC measure shows that the proposed incremental TF-IDF based LSH approach detects key events with nearly 95% true positive rate and around 5% false positive rate. The proposed game summarization algorithm generates summaries which are readable and competitive to human tailored summaries.


  1. Boyd, D. M, Ellison, N. B. 2007. Social network sites: Definition, history, and scholarship. Journal of Computer-Mediated Communication, 13(1): 210–230
  2. Atefeh, F, Khreich, W. 2015. A survey of techniques for event detection in twitter. Computational Intelligence, 31(1): 132-164
  3. Zhao, D, Rosson, M.B. 2009. How and why people Twitter: The role that micro-blogging plays in informal communication at work. In Proc. ACM International Conference on Supporting Group Work, GROUP ’09, 243–252
  4. Zhao, S., Zhong, L., Wickramasuriya, J, Vasudevan, V. 2011. Human as real-time sensors of social and physical events: A case study of twitter and sports games, arXiv:1106.4300
  5. Sankaranarayanan, J., Samet, H., Teitler, B. E., Lieberman, M. D., and Sperling, J. 2009. TwitterStand: news in tweets. In Proc. ACM SIGSPATIAL
  6. Hannon, J., McCarthy, K., Lynch, J and Smyth, B. 2011. Personalized and automatic social summarization of events in video. In Proc. ACM IUI
  7. Becker, H., Naaman, M and Gravano, L. 2011. Beyond Trending Topics: Real-World Event Identification on Twitter. In Proc. ICWSM, 11: 438–441
  8. Deepayan Chakrabarti, Kunal Punera. 2011. Event Summarization Using Tweets. ICWSM
  9. Nichols, J, Mahmud, J, Drews, C. 2012. Summarizing sporting events using twitter. In proc. ACM IUI
  10. Indyk, P, Motwani, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. Thirtieth Annual ACM Symposium on Theory of Computing, 604–613
  11. Charikar, M.S. 2002. Similarity estimation techniques from rounding algorithms. In Proc. 34th Annual ACM Symposium on Theory of Computing, Montreal, Quebec, Canada, 380-388
  12. Shamma, D.A., Kennedy, L., Churchill, E.F. 2011. Peaks and persistence: modeling the shape of microblog conversations. In Proc. of CSCW


Event Detection, Incremental TF-IDF, Locality Sensitive Hashing, Live Sports Tweets, Event Summarization