Call for Paper - December 2018 Edition
IJCA solicits original research papers for the December 2018 Edition. Last date of manuscript submission is November 20, 2018. Read More

Performance Evaluation of Five Machine Learning Algorithms and Three Feature Selection Algorithms for IP Traffic Classification

Print
PDF
Evolution in Networks and Computer Communications
© 2011 by IJCA Journal
Number 1 - Article 5
Year of Publication: 2011
Authors:
Kuldeep Singh
S. Agrawal

Kuldeep Singh and S Agrawal. Performance Evaluation of Five Machine Learning Algorithms and Three Feature Selection Algorithms for IP Traffic Classification. IJCA Special Issue on Evolution in Networks and Computer Communications (1):25-32, 2011. Full text available. BibTeX

@article{key:article,
	author = {Kuldeep Singh and S. Agrawal},
	title = {Performance Evaluation of Five Machine Learning Algorithms and Three Feature Selection Algorithms for IP Traffic Classification},
	journal = {IJCA Special Issue on Evolution in Networks and Computer Communications},
	year = {2011},
	number = {1},
	pages = {25-32},
	note = {Full text available}
}

Abstract

As volume of internet traffic over last couple of years due to drastic rise in number of internet users, the area of IP traffic classification has gained significant importance for various internet service providers and other public and private sector organizations. In today’s scenario, traditional IP traffic classification techniques such as port number based and payload based techniques are rarely used because of their limitations of use of dynamic port number instead of well-known port number in packet headers and various cryptographic techniques which inhibit inspection of packet payload. In order to overcome these limitations, machine learning (ML) techniques are used for IP traffic classification. In this research paper, real time internet traffic dataset has been developed using packet capturing tool and then using three different feature selection algorithms: Correlation based, Consistency based and Principal Components Analysis based feature selection algorithms, reduced feature datasets have been developed. After that, five popular ML algorithms MLP, RBF, C4.5, Bayes Net and Naïve Bayes are used for IP traffic classification with these datasets. This experimental evaluation shows that C4.5 Decision Tree Algorithm is an efficient ML technique for IP traffic classification with reduction in number of features characterizing each internet application using Correlation based Feature Selection Algorithm.

Reference

  1. Thuy T.T. Nguyen and Grenville Armitage. “A Survey of Techniques for Internet Traffic Classification using Machine Learning,” IEEE Communications Survey & tutorials, Vol. 10, No. 4, pp. 56-76, Fourth Quarter 2008.
  2. Arthur Callado, Carlos Kamienski, Géza Szabó, Balázs Péter Ger˝o, Judith Kelner,Stênio Fernandes ,and Djamel Sadok. “A Survey on Internet Traffic Identification,” IEEE Communications Survey & tutorials, Vol. 11, No. 3, pp. 37-52, Third Quarter 2009.
  3. Runyuan Sun, Bo Yang, Lizhi Peng, Zhenxiang Chen, Lei Zhang, and Shan Jing. “Traffic Classification Using Probabilistic Neural Network,” in Sixth International Conference on Natural Computation (ICNC 2010), 2010, pp. 1914-1919.
  4. Kuldeep Singh and Sunil Agrawal, “Internet Traffic Classification using RBF Neural Network,” in International Conference on Communication and Computing technologies(ICCCT-2011), Jalandhar, India, February 25-26, 2011, paper 10, p.39-43.
  5. Andrew W. Moore and Denis Zuev, “Internet Traffic Classification Using Bayesian Analysis Techniques,” in SIGMETRICS'05, Banff, Alberta, Canada , June 6.10, 2005.
  6. Luca Salgarelli, Francesco Gringoli, Thomas Karagiannis. “Comparing Traffic Classifiers,” ACM SIGCOMM Computer Communication Review, Vol. 37, No. 3, pp. 65-68, July 2007.
  7. Andrew W. Moore, Denis Zuev, Michael L. Crogan. 2005. Discriminators for use in flow-based classification. Queen Mary University of London, Department of Computer Science, RR-05-13, August 2005.
  8. Y.L. Chongand K. Sundaraj, “A Study of Back Propagation and Radial Basis Neural Networks on ECG signal classification,” in 6th International Symposium on Mechatronics and its Applications (ISMA09), Sharjah, UAE, March 24-26, 2009.
  9. Mutasem khalil Alsmadi, Khairuddin Bin Omar, Shahrul Azman Noah ,Ibrahim Almarashdah, “Performance Comparison of Multi-layer Perceptron (Back Propagation, Delta Rule and Perceptron) algorithms in Neural Networks” in 2009 IEEE International Advance Computing Conference (IACC 2009) ,Patiala, India, 6-7 March 2009, p. 296-299.
  10. Thales Sehn Korting, “C4.5 algorithm and Multivariate Decision Trees” Image Processing Division, National Institute for Space Research – INPE, SP, Brazil.
  11. Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2th edition, Morgan Kaufmann Publishers, San Francisco, CA, 2005.
  12. Weka website. Available: http:// www.cs.waikato.ac.nz/ml/weka/
  13. Jie Cheng, Russell Greiner, “Learning Bayesian Belief Network Classifiers: Algorithms and System,” Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
  14. Ioan Pop, “An approach of the Naive Bayes classifier for the document classification,” General Mathematics, Vol. 14, No. 4, pp.135-138, 2006.
  15. Mark A. Hall. 1999. Correlation based Feature Selection for Machine Learning. University of Waikato, Hamilton, New Zealand, April, 1999.
  16. Manoranjan Dash, Huan Lau, “Consistency – based search in feature selection”, Artificial Intelligence, Elsevier, 27 March, 2003.
  17. Christos Boutsidis, Michael W. Mahoney, Petros Drineas, “Unsupervised Feature Selection for Principal Components Analysis”, KDD’08, August 24–27, 2008, Las Vegas, Nevada, USA.
  18. Simon Haykin, Neural Networks: A Comprehensive foundation, 2th edition, Pearson Prentice Hall, New Delhi, 2005.
  19. Wireshark, Available: http:// www.wireshark.org/