Call for Paper - January 2022 Edition
IJCA solicits original research papers for the January 2022 Edition. Last date of manuscript submission is December 20, 2021. Read More

Improving Web Search Results by removing Outliers using Data Mining Techniques

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
Mennatollah M. Mahmoud, Shaimaa Salama, Doaa S. Elzanfaly

Mennatollah M Mahmoud, Shaimaa Salama and Doaa S Elzanfaly. Improving Web Search Results by removing Outliers using Data Mining Techniques. International Journal of Computer Applications 176(7):9-14, October 2017. BibTeX

	author = {Mennatollah M. Mahmoud and Shaimaa Salama and Doaa S. Elzanfaly},
	title = {Improving Web Search Results by removing Outliers using Data Mining Techniques},
	journal = {International Journal of Computer Applications},
	issue_date = {October 2017},
	volume = {176},
	number = {7},
	month = {Oct},
	year = {2017},
	issn = {0975-8887},
	pages = {9-14},
	numpages = {6},
	url = {},
	doi = {10.5120/ijca2017915635},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Many users access the web seeking for information. They put their query or question in search engines that may returns irrelevant pages or results compared to users’ needs. This research paper proposes a model to remove outliers from the search results. The proposed model is based on association rules, modified Naïve Bayes algorithm and clustering techniques. The Naïve Bayes algorithm is modified to help removing outliers from the search results. The proposed model has been evaluated using the Sum of Squared Errors (SSE), silhouette coefficient and entropy evaluation measures against the standard k-medoids algorithm. Experimental results show that the proposed model outperforms the standard k-medoids clustering algorithm in removing the search outliers.


  1. D. S. Rajput, R. S. Thakur, and G. S. Thakur, "An integrated approach and framework for document clustering using graph based association rule mining", Second International Conference on Soft Computing for Problem Solving, India, 2012, pp. 1421-1437.
  2. R. K. Roul, O. R. Devanand, and S. K. Sahay, "Web document clustering and ranking using tf-idf based apriori approach," International Conference on Advances in Computer Engineering and Applications ICACEA, 2014, pp. 74-78.
  3. N. Negm, M. Amin, P. Elkafrawy, and A. B. M. Salem, "Investigate the performance of document clustering approach based on association rules mining," (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 4, pp. 142-151, 2013.
  4. N. Shah and S. Mahajan, "Document clustering: a detailed review," International Journal of Applied Information Systems (IJAIS), vol. 4, pp. 30-38, 2012.
  5. T. Velmurugan, "Efficiency of k-means and k-medoids algorithms for clustering arbitrary data points, Int. Journal of Computer Technology & Applications, vol. 3, pp. 1758-1764, 2012.
  6. M. M. Zaw and E. E. Mon, "Web document clustering using cuckoo search clustering algorithm based on levy flight", International Journal of Innovation and Applied Studies vol. 4, pp. 182-188, 2013.
  7. K. A. A. Nazeer, S. D. M. Kumar, and M. P. Sebastian, "Enhancing the k-means clustering algorithm by using a O(n logn) heuristic method for finding better initial centroids" , Second International Conference on Emerging Applications of Information Technology (EAIT), Kolkata, India, 2011.
  8. A.S.N.Chakravarthy, Deepthi.S, K.Satyatej, Sk.Nizmi, and S.Sindhura, "Document clustering in web search engine", International Journal of Computer Trends and Technology, vol. 3, pp. 290-293, 2012.
  9. M. Yasodha and P. Ponmuthuramalingam, "An advanced concept-based mining model to enrich text clustering”, IJCSI International Journal of Computer Science Issues, vol. 9, pp. 417-422, 2012.
  10. P. Vigneshvaran, E. Jayabalan, and K. Vijaya, "A predominant statistical approach to identify semantic similarity of textual documents", in Informatics and Mobile Engineering (PRIME) International Conference on Pattern Recognition, Salem, India, 2013, pp. 496-499.
  11. H. Kim, X. Ren, Y. Sun, C. Wang, and J. Han, "Semantic frame-based document representation for comparable corpora", IEEE 13th International Conference on Data Mining (ICDM), Dallas, TX, USA, 2013.
  12. S. S. Bama, M. S. I. Ahmed, and A. Saravanan, "A mathematical approach for mining web content outliers using term frequency ranking", Journal of Science and Technology, vol. 8, pp. 1-5, 2015.
  13. L. Huang, T. Cassidy, X. Feng, H. Ji, C. R. Voss, J. Han, and A. Sil, "Liberal event extraction and event schema induction", 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 258-268.
  14. W. R. W. Zulkifeli, N. Mustapha, and A. Mustapha, "Classic term weighting technique for mining web content outliers", International Conference on Computational Techniques and Artificial Intelligence (ICCTAI'2012), Penang, Malaysia, 2012.
  15. V. Gurusamy and S. Kannan, "Preprocessing techniques for text mining," 2014.
  16. UCI Machine Learning Repository: AAAI 2014 Accepted Papers Data Set.
  17. T. M. Kodinariya and P. R. Makwana, "Review on determining number of cluster in k-means clustering", International Journal of Advance Research in Computer Science and Management Studies, vol. 1, pp. 90-95, 2013.
  18. J. Han, M. Kamber, and J. Pei, Cluster analysis: basic concepts and methods in Data mining concepts and techniques, Third Ed. New York, USA: Elsevier Inc.
  19. P.-N. Tan, M. Steinbach, and V. Kumar, Cluster analysis: basic concepts and algorithms in Introduction to data mining. Boston Pearson Addison Wesley, 2006.
  20. A. Rosenberg and J. Hirschberg, "V-Measure: A conditional entropy-based external cluster evaluation measure", Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, 2007, pp. 410–420.
  21. J. Han, M. Kamber, and J. Pei, Classification: basic concepts in Data mining concepts and techniques. New York, USA: Elsevier Inc.
  22. T. R. Patil and S. S. Sherekar, "Performance analysis of naive bayes and j48 classification algorithm for data classification" International Journal of Computer Science and Applications, vol. 6, pp. 256-261, 2013.


Information Retrieval (IR), Web mining, Association rules (AR), Classification, Clustering, Outlier detection.