Call for Paper - March 2022 Edition
IJCA solicits original research papers for the March 2022 Edition. Last date of manuscript submission is February 22, 2022. Read More

KDSSF: A Graph Modeling Approach

International Journal of Computer Applications
© 2011 by IJCA Journal
Volume 33 - Number 4
Year of Publication: 2011
Muhammad Naeem
Sohail Asghar

Muhammad Naeem and Sohail Asghar. Article: KDSSF: A Graph Modeling Approach. International Journal of Computer Applications 33(4):31-37, November 2011. Full text available. BibTeX

	author = {Muhammad Naeem and Sohail Asghar},
	title = {Article: KDSSF: A Graph Modeling Approach},
	journal = {International Journal of Computer Applications},
	year = {2011},
	volume = {33},
	number = {4},
	pages = {31-37},
	month = {November},
	note = {Full text available}


In recent years, data mining applications have been found quite extendible in the area of social science like mass communication and religion studies. In traditional approach used for such work, hidden semantics between documents were not considered well. In this study, we have shown that text mining can be applied to classify social figures like politician, religious leaders. Such classification is based on text mining of speeches delivered by social figures. These social figures are famed personalities and their speeches are collected from their official websites. Our text classification is based on tf.idf followed by cosine and Jaccard Similarity. To improve the results on discerning features, we have designed a hash graph modeling technique Knowledge Discovery System for Social Figures (KDSSF) based on synonym words dictionary. In the comparative analysis of speeches made by social figures, we did not focus on the provision of the optimal matches but overall classification of the social figures in any domain of interests. Preliminary experiments have illustrated that inclusion of hash based graph modeling can significantly improve the results of classification.


  • Barabási, AL. Linked: The new science of networks. Cambridge, MA: Perseus; 2002.
  • Batagelj, V.; Mrvar, A.; Zaveršnik, M. Network analysis of texts. In: Tomǎ, E.; Gros, J., editors. Proceedings of the 5th International Multi-Conference Information Society—Language Technologies. Ljubljana: Slovenia: Multi-Conference Information Society; 2002. p. 143-148.
  • Corps: A corpus of tagged political speeches.
  • Ehud R and Somayajulu S,(2004) Contextual Influences on Near-Synonym Choice, INLG 2004, LNAI 3123, pp. 161–170,
  • Ferrer i Cancho R, Solé RV. The small world of human language. Proceedings of the Royal Society of London B: Biological Sciences. 2001;268:2261–2266.
  • Gunes Erkan, Dragomir R. Radev, LexRank: Graph-based Lexical Centrality as Salience in Text Summarization, Journal of Arti_cial Intelligence Research 22 (2004) 457-479
  • Hamid S and Manucher D, (1991) The production Data-based similarity coefficient versus Jaccard's similarity coefficient, Computers ind. Engng Vol. 21, Nos 1-4, pp. 263-266,
  • Hasegawa T., Kanagawa Y. and Satoshi S., Discovering relations among named entities from large corpora,ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, 2004.
  • Jacob B ,Benjamin C, (2008) Calculating the Jaccard Similarity Coe_cient with Map Reduce for Entity Pairs in Wikipedia, Wikipedia Similarity Team Project
  • Kleinberg, J.M. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604-632.
  • Litvak M , Last M, Graph-Based Keyword Extraction for Single-Document Summarization, Proceedings of the workshop on Multi-source Multilingual Information Extraction and Summarization, pages 17–24 Manchester, August 2008.
  • Michael S. Vitevitch, What Can Graph Theory Tell Us About Word Learning and Lexical Retrieval?, Speech Lang Hear Res. 2008 April ; 51(2): 408–422. doi:10.1044/1092-4388(2008/030).
  • Motter A. E., de Moura A. P. S., Y.-C. Lai, and P. Dasgupta. Topology of the conceptual network of language. Physical Review E, 65(6):065102, 2002.
  • Ryder, J., Zhang, S. (2010). Preliminary Results of Ranking Political Figures Using Naive Bayes Text Classification. Proceedings of the 2010 International Conference on Data Mining (DMIN 2010). Las Vegas, Nevada, USA. July 12-15, 2010. CSREA Press 2010. ISBN: 1-60132-138-4, Robert Stahlbock and Sven Crone (Eds.)
  • Sahami M. & Heilman T. (2006). A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets. In Proc. of the 15th Int’l Conf. on the World Wide Web, 377-386.
  • Salton, G., & McGill, M. (Eds.). (1983). Introduction to modern information retrieval. McGraw-Hill.
  • Spertus E., Sahami M., & O. Buyukkokten (2005). Evaluating Similarity Measures: A Large Scale Study in the Orkut Social Network. In Proc. of the 11th ACM-SIGKDD Int’l Conf. on Knowledge Discovery in Data Mining, 678-684
  • Synonym Dictionary, retrieved on September, 2011.
  • Takaaki Hasegawa, Satoshi Sekine and Ralph Grishman, Discovering Relations among Named Entities from Large Corpora, Proc. of ACL-2004 (2004), pp. 415-422.
  • Tommy W.S., Chow, Haijun Zhang, Rahman M.K.M., A new document representation using term frequency and vectorized graph connectionists with application to document retrieval, Expert Systems with Applications, 2009
  • Vincenzo Di Lecce, Marco Calabrese, and Domenico Soldo, (2008) Mining Context-Specific Web Knowledge: An Experimental Dictionary-Based Approach, ICIC 2008, LNAI 5227, pp. 896–905, 2008.
  • Wilks C, Meara P, Wolter B. (2005) A further note on simulating word association behaviour in a second language. Second Language Research ;21:359–372.
  • Zobel, J., & Moffat, A. (1998). Exploring the similarity space. ACM SIGIR Forum, 32(1), 18–34.
  • C. Fellbaum, WordNet: An Electronic Lexical Da t a b a s e .MIT Press, 1998.