Call for Paper - November 2022 Edition
IJCA solicits original research papers for the November 2022 Edition. Last date of manuscript submission is October 20, 2022. Read More

Behavioral Malware Detection using Deep Graph Convolutional Neural Networks

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2021
Authors:
Angelo Schranko De Oliveira, Renato Jose Sassi
10.5120/ijca2021921218

Angelo Schranko De Oliveira and Renato Jose Sassi. Behavioral Malware Detection using Deep Graph Convolutional Neural Networks. International Journal of Computer Applications 174(29):1-8, April 2021. BibTeX

@article{10.5120/ijca2021921218,
	author = {Angelo Schranko De Oliveira and Renato Jose Sassi},
	title = {Behavioral Malware Detection using Deep Graph Convolutional Neural Networks},
	journal = {International Journal of Computer Applications},
	issue_date = {April 2021},
	volume = {174},
	number = {29},
	month = {Apr},
	year = {2021},
	issn = {0975-8887},
	pages = {1-8},
	numpages = {8},
	url = {http://www.ijcaonline.org/archives/volume174/number29/31858-2021921218},
	doi = {10.5120/ijca2021921218},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Malware behavioral graphs provide a rich source of information that can be leveraged for detection and classification tasks. In this paper, we propose a new behavioral malware detection method that extracts behavioral graphs from API call sequences and uses a Deep Graph Convolutional Neural Network (DGCNN), a state-of-the-art neural network architecture that can directly accept graphs of arbitrary structures, to learn a binary classification function able to distinguish between malware and goodware. In order to train and evaluate the models, we created a new public domain dataset of more than 40,000 API call sequences resulting from the execution of malware and goodware instances in a sandboxed environment. Experimental results show that our models achieve similar Area Under the ROC Curve (AUC-ROC), F1-Score, Precision, and Recall to Long-Short Term Memory (LSTM) networks, widely used as the base architecture for sequence learning in behavioral malware detection methods, thus indicating that the models can effectively learn to classify malicious and benign temporal patterns through convolution operations on graphs. To the best of our knowledge, this is the first paper that investigates the applicability of DGCNN to behavioral malware detection using API call sequences.

References

  1. AV-TEST. Malware statistics & trends report, 2019.
  2. Ekta Gandotra, Divya Bansal, and Sanjeev Sofat. Malware analysis and classification: A survey. Journal of Information Security, 5(02):56, 2014.
  3. Ilsun You and Kangbin Yim. Malware obfuscation techniques: A brief survey. In 2010 International conference on broadband, wireless computing, communication and applications, pages 297–300. IEEE, 2010.
  4. G´erard Wagener, Alexandre Dulaunoy, et al. Malware behaviour analysis. Journal in computer virology, 4(4):279– 287, 2008.
  5. Jayant Shukla. Application sandbox to detect, remove, and prevent malware, January 17 2008. US Patent App. 11/769,297.
  6. Gr´egoire Jacob, Herv´e Debar, and Eric Filiol. Behavioral detection of malware: from a survey towards an established taxonomy. Journal in computer Virology, 4(3):251–266, 2008.
  7. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.
  8. Ben Athiwaratkun and JackWStokes. Malware classification with lstm and gru language models and a character-level cnn. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2482–2486. IEEE, 2017.
  9. Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas. Malware classification with recurrent networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1916–1920. IEEE, 2015.
  10. Matilda Rhode, Pete Burnap, and Kevin Jones. Early-stage malware prediction using recurrent neural networks. computers & security, 77:578–594, 2018.
  11. Mahmoud Kalash, Mrigank Rochan, Noman Mohammed, Neil DB Bruce, Yang Wang, and Farkhund Iqbal. Malware classification with deep convolutional neural networks. In 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS), pages 1–5. IEEE, 2018.
  12. Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, and Charles K Nicholas. Malware detection by eating a whole exe. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  13. Bojan Kolosnjaji, Apostolis Zarras, George Webster, and Claudia Eckert. Deep learning for classification of malware system call sequences. In Australasian Joint Conference on Artificial Intelligence, pages 137–149. Springer, 2016.
  14. Shun Tobiyama, Yukiko Yamaguchi, Hajime Shimada, Tomonori Ikuse, and Takeshi Yagi. Malware detection with deep neural network using process behavior. In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), volume 2, pages 577–582. IEEE, 2016.
  15. Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. An end-to-end deep learning architecture for graph classification. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  16. Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596, 2019.
  17. Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434, 2018.
  18. David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Al´an Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pages 2224–2232, 2015.
  19. Tian Xie and Jeffrey C Grossman. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical review letters, 120(14):145301, 2018.
  20. Andrew P Bradley. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern recognition, 30(7):1145–1159, 1997.
  21. Sepp Hochreiter and J¨urgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  22. Daniel S Berman, Anna L Buczak, Jeffrey S Chavis, and Cherita L Corbett. A survey of deep learning methods for cyber security. Information, 10(4):122, 2019.
  23. Yann LeCun, L´eon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  24. Jiaqi Yan, Guanhua Yan, and Dong Jin. Classifying malware represented as control flow graphs using deep graph convolutional neural network. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 52–63. IEEE, 2019.
  25. Haodi Jiang, Turki Turki, and Jason TL Wang. Dlgraph: Malware detection using deep learning and graph embedding. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 1029–1033. IEEE, 2018.
  26. Rui Zhu, Chenglin Li, Di Niu, Hongwen Zhang, and Husam Kinawi. Android malware detection using largescale network representation learning. arXiv preprint arXiv:1806.04847, 2018.
  27. Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  28. Anh Viet Phan, Minh Le Nguyen, Yen Lam Hoang Nguyen, and Lam Thu Bui. Dgcnn: A convolutional neural network over large-scale labeled graphs. Neural Networks, 108:533– 543, 2018.
  29. cuckoosandbox.org. Automated malware analysis, 2019.
  30. Angelo Oliveira. Malware analysis datasets: Api call sequences, 2019.
  31. VirusShare. Virusshare, 2019.
  32. portableapps.com. portableapps.com, 2019.
  33. Domagoj Babi´c, Daniel Reynaud, and Dawn Song. Malware analysis with tree automata inference. In International Conference on Computer Aided Verification, pages 116–131. Springer, 2011.
  34. Virus Total. Virustotal-free online virus, malware and url scanner. Online: https://www.virustotal.com/en, 2012.
  35. Younghee Park, Douglas Reeves, Vikram Mulukutla, and Balaji Sundaravel. Fast malware classification by automated behavioral graph matching. In Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research, page 45. ACM, 2010.
  36. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  37. Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML- 10), pages 807–814, 2010.
  38. Angelo Oliveira. Behavioral malware detection using deep graph convolutional neural networks. https://github.com/ang3loliveira/behavioral_ malware_detection_dgcnn_v2, 2019.
  39. Gareth James, DanielaWitten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning, volume 112. Springer, 2013.
  40. Qiong Gu, Li Zhu, and Zhihua Cai. Evaluation measures of the classification performance of imbalanced data sets. In International symposium on intelligence computation and applications, pages 461–471. Springer, 2009.
  41. Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority oversampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  42. Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

Keywords

Computer Security, Deep Learning, Dynamic Analysis, Malware Detection