Call for Paper - May 2023 Edition
IJCA solicits original research papers for the May 2023 Edition. Last date of manuscript submission is April 20, 2023. Read More

Malicious URL Detection and Identification

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 99 - Number 17
Year of Publication: 2014
Anjali B. Sayamber
Arati M. Dixit

Anjali B Sayamber and Arati M Dixit. Article: Malicious URL Detection and Identification. International Journal of Computer Applications 99(17):17-23, August 2014. Full text available. BibTeX

	author = {Anjali B. Sayamber and Arati M. Dixit},
	title = {Article: Malicious URL Detection and Identification},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {99},
	number = {17},
	pages = {17-23},
	month = {August},
	note = {Full text available}


Malicious links are used as a source by the distribution channels to broadcast malware all over the Web. These links become instrumental in giving partial or full system control to the attackers. This results in victim systems, which get easily infected and, attackers can utilize systems for various cyber crimes such as stealing credentials, spamming, phishing, denial-of-service and many more such attacks. To detect such crimes systems should be fast and precise with the ability to detect new malicious content. This paper introduces various aspects associated with the URL (Uniform Resource Locator) classification process which recognizes whether the target website is a malicious or benign. The standard datasets are used for training purpose from different sources. The rising problem spamming, phishing and malware, has generated a need for reliable framework solution which can classify and further identify the malicious URL. An alternative approach has been proposed which uses a Naïve Bayes classifier for an automated classification and detection of malicious URLs. The proposed model based on Naive Bayes is supported by clustering and classification technique. On the other hand, they are rarely used for general probabilistic learning and inference which is typically used for estimating with conditional and marginal distributions. The proposed work in this paper shows that, for a wide range of benchmark datasets, Naive Bayes models learned using Probability model has better accuracy than Support Vector Machine model.


  • Harry Zhang "The Optimality of Naive Bayes". FLAIRS 2004 conference.
  • Caruana, R. and Niculescu-Mizil, A. : "An empirical comparison of supervised learning algorithms". Proceedings of the 23rd international conference on Machine learning, 2006.
  • George H. John and Pat Langley "Estimating Continuous Distributions in Bayesian Classifiers". Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. pp. 338-345. Morgan Kaufmann, San Mateo, 1995.
  • Breese, J. S. , Heckerman, D. , & Kadie, C. "Empirical analysis of predictive algorithms for collaborative filtering". Proc. UAI-98, (1998), (pp. 43–52).
  • Cheese man, P. , & Stutz, J. (1996). Bayesian classification (Auto Class): Theory and results. In Advances in knowledge discovery and data mining, 153–180. Menlo Park, CA: AAAI Press.
  • Dempster, A. P. , Laird, N. M. , & Rubin, D. B. (1977). MaCmum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society B, 39, 1–38.
  • Domingo's, P. , & Pazzani M. . "On the optimality of the simple Bayesian classifier under zero-one loss". Machine Learning, 29, 103–130, (1997). .
  • Friedman, N. (1998). The Bayesian structural EM algorithm. Proc. UAI-98 (pp. 129–138),
  • Gilks, W. R. , Richardson, S. , & Spiegel halter, D. J. (Eds. ). (1996). Markov chain Monte Carlo in practice. London, UK: Chapman and Hall.
  • Heckerman, D. , Geiger, D. , & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statist. data. Machine Learning, 20, 197–243.
  • Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann.
  • Roth, D. (1996). On the hardness of approCmate reasoning. Artificial Intelligence, 82, 273–302.
  • Yedidia, J. S. , Freeman, W. T. , & Weiss, Y. (2001). Generalized belief propagation. In Adv. NIPS 13, 689–695.
  • Hyunsang Choi. . Seoul, Bin B. Zhu. "Detecting Malicious Web Links and Identifying Their Attack Types". Korea University (2011).
  • DNS-BH. Malware prevention through domain blocking.
  • JWSPAMSPY. E-mail spam filter for Microsoft Windows.
  • PHISHTANK. Free community site for anti-phishing service.
  • http://random. yahoo. com/bin/ryl)3. (accessed on 20/06/2014)
  • Mcgraph, D. K. , And Gupta, M. (2008). Behind phishing: An examination of phisher modi operandi. In LEET: Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats.
  • Hou, Y. -T. , Chang, Y. , Chen, T. , Laih, C. -S. , And Chen, C. -M. "Malicious web content detection by machine learning". Expert Systems with Applications (2010), 55–60.
  • Ramchandran, A. , And Feamster, N. "Understanding the network-level behavior of spammers". In Sigcomm (2006).
  • Holz, T. , Gorecki, C. , Rieck, K. , And Freiling, F. C. "Detection and mitigation of fast-flux service networks". In NDSS: Proceedings of the Network and Distributed System Security Symposium (2008).
  • Anjali B. Sayamber, Arati M. Dixit. "On URL Classification" International Journal of Computer Trends and Technology (IJCTT) – volume 12 number 5 – Jun 2014.
  • http://en. wikipedia. org/wiki/Malware (accessed on 20/06/2014).
  • Fette, I. , Sadeh, N. , and Tomasic, A. "Learning to detect phishing emails". In WWW: Proceedings of the international conference on World Wide Web (2007).
  • Cortes, C. , and Vapnik, V. "Support vector networks". Machine Learning (1995), 273–297.
  • Zhang, Y. , Hong, J. , and Cranor, L. Cantina: "A content-based approach to detecting phishing web sites". In WWW: Proceedings of the international conference on World Wide Web (2007).
  • Ntoula, A. , Najork, M. , Manasse, M. , and Fetterly,D. "Detecting spam web pages through content analysis". In WWW: Proceedings of international conference on World Wide Web (2006).