Real Time Event Detection Adopting Incremental TF-IDF based LSH and Event Summary Generation

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Jeyakumar Kannan, Ar Md Shanavas, Sridhar Swaminathan

Jeyakumar Kannan, Ar Md Shanavas and Sridhar Swaminathan. Real Time Event Detection Adopting Incremental TF-IDF based LSH and Event Summary Generation. International Journal of Computer Applications 180(13):22-30, January 2018. BibTeX

	author = {Jeyakumar Kannan and Ar Md Shanavas and Sridhar Swaminathan},
	title = {Real Time Event Detection Adopting Incremental TF-IDF based LSH and Event Summary Generation},
	journal = {International Journal of Computer Applications},
	issue_date = {January 2018},
	volume = {180},
	number = {13},
	month = {Jan},
	year = {2018},
	issn = {0975-8887},
	pages = {22-30},
	numpages = {9},
	url = {},
	doi = {10.5120/ijca2018916252},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Recently, twitter users are leveraged to detect social and physical events such as festivals and traffic jam at real time. Real time event detection and summarization from Cricket sports is the process of detecting events such as boundary at real time from live Cricket tweet stream as soon as event happens and generating a quick game summary. This is an interesting, yet a complex problem. Because of the need for rapid detection of sports events and for the generation of a concise summary from huge volume of tweets for Cricket enthusiasts. In this paper, a novel framework is proposed for detecting key events from live Cricket tweets and for generating a game summary using the crawled tweets. Feature vectors of live tweets are created using incremental TF-IDF representation and tweet clusters are discovered using Locality Sensitive Hashing (LSH) where the post rate of each cluster determines the key event. A key event is recognized from that cluster using our domain specific event lexicon. Then, important moments from the crawled tweets are computed by identifying the spikes in the tweets volume. Top-k tweets from each moment are selected by ranking tweets on top-k words. Representative tweets from top-k tweets are identified using Jaccard similarity. The evaluation on 2017 IPL T20 Cricket live tweets using ROC measure shows that the proposed incremental TF-IDF based LSH approach detects key events with nearly 95% true positive rate and around 5% false positive rate. The proposed game summarization algorithm generates summaries which are readable and competitive to human tailored summaries.


Event Detection, Incremental TF-IDF, Locality Sensitive Hashing, Live Sports Tweets, Event Summarization