Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

A Comparison of Imputation Techniques using Network Traffic Data

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Authors:
Fidan Kaya Gülağız, Onur Gök, Adnan Kavak
10.5120/ijca2016909903

Onur Gök Fidan Kaya Gülağız and Adnan Kavak. A Comparison of Imputation Techniques using Network Traffic Data. International Journal of Computer Applications 142(7):25-29, May 2016. BibTeX

@article{10.5120/ijca2016909903,
	author = {Fidan Kaya Gülağız, Onur Gök and Adnan Kavak},
	title = {A Comparison of Imputation Techniques using Network Traffic Data},
	journal = {International Journal of Computer Applications},
	issue_date = {May 2016},
	volume = {142},
	number = {7},
	month = {May},
	year = {2016},
	issn = {0975-8887},
	pages = {25-29},
	numpages = {5},
	url = {http://www.ijcaonline.org/archives/volume142/number7/24909-2016909903},
	doi = {10.5120/ijca2016909903},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Creation of data sets to be used for studies in many different fields of research is really important process. However these data sets suffer from the problem of missing values. There are many different ways of handling missing values. Deletion methods and single imputation methods are the most common ones of these methods. However, this methods lead to high errors in data sets with high loss rates. Data sets used for the analysis of network traffic are also commonly encounters with the missing values. In this study, data produced in different sizes and different missing value rates for the analysis of network traffic in distributed systems. Then, different data imputation methods are compared for dealing with missing values in these datasets. Experimental results showed that Expectation Maximization Method is more applicable and performs better at relatively high missing data rates and k Nearest Neighbors Method performs better at low missing rates.

References

  1. Giraldo, M. M., Sanchez, J. S., Traver, V. J. 2010. A comparison of techniques for handling incomplete data with a focus on attributes relevance influence. In Proceedings of the Ninth International Conference on Machine Learning and Applications.
  2. Twala, B., Cartwright, M., Shepperd, M. 2005. Comparison of various methods for handling incomplete data in software engineering database. In Proceedings of the International Symposium on Empirical Software Engineering.
  3. Chang, G., Ge, T. 2011. Comparison of missing data imputation methods for traffic flow. In Proceedings of the International Conference on Transportation, Mechanical, and Electrical Engineering.
  4. Lıu, C. F., Chen, T. T., Lee, S. J. 2012. A comparison of approaches for dealing with missing values. In Proceedings of the International Conference on Machine Learning and Cybernetics.
  5. Y. Li, Z. Li, L. Li, “Missing traffic data: comparison of imputation methods”, IET Intelligent Transport Systems, 2013.
  6. Chang, G., Ge, T. 2011. Comparison of missing data imputation methods for traffic data. In Proceedings of the International Conference on Transportation, Mechanical and Electrical Engineering.
  7. Yılmaz, H. 2014. Random Forests YöntemindeKayıpVeriProblemininİncelenmesiveSağlıkAlanındaBirUygulama. Master Thesis. University of Eskişehir Osmangazi.
  8. Sezgin, E., Çelik, Y. 2013. Veri madenciliğinde kayıp veriler için kullanılan yöntemlerin karşılaştırılması. In Proceedings of the Akademik Bilişim Konferansı.
  9. Wasito, I. 2003. Least Squares Algorithms with Nearest Neighbour Techniques for Imputing Missing Data Values. Doctora Thesis. University of London.
  10. Goldberger, A. S. 1964 Econometric Theory. New York: John Wiley & Sons.
  11. C. F. J. Wu, “On the convergence properties of the EM Algorithm”, The Annals of Statistics, 1983.
  12. Liu, C., Chen, T., Lee, S. 2012. A comparison of approaches for dealing with missing values. In Proceedings of the International Conference on Machine Learning and Cybernetics.
  13. A. P. Dempster, N. M. Laird, "Maximum likelihood from incomplete data via the EM Algorithm", Journal of the Royal Statistical Society, 1977.
  14. Xu, G, Zong, Y., Yang, Z. 2013 Applied Data Mining. CRC Press.
  15. T. Eylen, C. F. Bazlamaçcı, “One - way active delay measurement with error bounds ”, IEEE Transactions on Instrumentation and Measurement, 2015.

Keywords

Least Square Estimation (LSE), Expectation Maximization (EM), k Nearest Neighbors (k-NN), Traffic Data, Missing Value Imputation.