Call for Paper - September 2022 Edition
IJCA solicits original research papers for the September 2022 Edition. Last date of manuscript submission is August 22, 2022. Read More

Malware Classification through HEX Conversion and Mining

IJCA Proceedings on EGovernance and Cloud Computing Services - 2012
© 2012 by IJCA Journal
EGOV - Number 4
Year of Publication: 2012
A. Pratheema Manju Prabha
P. Kavitha

Pratheema Manju A Prabha and P Kavitha. Article: Malware Classification through HEX Conversion and Mining. IJCA Proceedings on EGovernance and Cloud Computing Services - 2012 EGOV(4):6-12, December 2012. Full text available. BibTeX

	author = {A. Pratheema Manju Prabha and P. Kavitha},
	title = {Article: Malware Classification through HEX Conversion and Mining},
	journal = {IJCA Proceedings on EGovernance and Cloud Computing Services - 2012},
	year = {2012},
	volume = {EGOV},
	number = {4},
	pages = {6-12},
	month = {December},
	note = {Full text available}


The malicious codes are normally referred as malware. Systems are vulnerable to the traditional attacks, and attackers continue to find new ways around existing protection mechanisms in order to execute their injected code. Malware is a pervasive problem in distributed computer and network systems. These new malicious executables are created at the rate of thousands every year. There are several types of threat to violate these components; for example Viruses, Worms, Trojan horse and Malware. Malware represents a serious threat to confidentiality since it may result in loss of control over private data for computer users. It is typically hidden from the user and difficult to detect since it can create significant unwanted CPU activity, disk usage and network traffic. In existing systems, new malicious programs can be detected by automatic signature generation called as F-Sign for automatic extraction of unique signatures from malware files. This is primarily intended for high-speed network traffic. The signature extraction process is based on a comparison with a common function repository. The data mining framework employed in this research learns through analyzing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k?Nearest Neighbor (kNN) Algorithm, and J48 decision tree and have evaluated their performance. This involves extracting opcode sequence from the dataset, to construct a classification model and to identify it as malicious or benign. Our approach showed 98. 4% detection rate on new programs whose data was not used in the model building process.


  • Symantec, "Symantec internet security threat report: Volume XII," effectiveness and efficiency of our work is in
  • . In their Symantec 2008.
  • F-Secure. (2007, 19 August 2009). F-Secure Reports Amount of variants such as the Netsky family of malware u sing the Malware Grew by 100% during 2007.
  • K. Griffin, S. Schneider, X. Hu, and T. Chiueh, "Automatic Generation of String Signatures for Malware Detection," in Recent Advances in Intrusion Detection: 12th International Symposium, RAID 2009 , Saint- Malo, France, 2009.
  • J. O. Kephart and W. C. Arnold, "Automatic extraction of computer virus signatures," in 4th Virus Bulletin International Conference , 1994, pp. 178-184.
  • J. Z. Kolter and M. A. Maloof, "Learning to detect malicious executables in the wild," in International Conference on Knowledge Discovery and Data Mining , 2004, pp. 470-478.
  • M. E. Karim, A. Walenstein, A. Lakhotia, and L. Parida, "Malware phylogeny generation using permutations of code," Journal in Computer Virology, vol. 1, pp. 13-23, 2005.
  • M. Gheorghescu, "An automated virus classification system," in Virus Bulletin Conference , 2005, pp. 294-300.
  • Y. Ye, D. Wang, T. Li, and D. Ye, "IMDS: intelligent malware detection system," in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining , 2007.
  • E. Carrera and G. Erdélyi, "Digital genome mapping–advanced binary malware analysis," in Virus Bulletin Conference , 2004, pp. 187-197.
  • T. Dullien and R. Rolles, "Graph-based comparison of Executable Objects (English Version)," in SSTIC , 2005.
  • I. Briones and A. Gomez, "Graphs, Entropy and Grid Computing: Automatic Comparison of Malware," in Virus Bulletin Conference , 2008 pp. 1-12.
  • S. Cesare and Y. Xiang, "Classification of Malware Using Structured Control Flow," in 8th Australasian Symposium on Parallel and Distributed Computing (AusPDC 2010) , 2010.
  • G. Bonfante, M. Kaczmarek, and J. Y. Marion, "Morphological Detection of Malware," in International Conference on Malicious and Unwanted Software, IEEE , Alexendria VA, USA, 2008, pp. 1-8.
  • R. T. Gerald and A. F. Lori, "Polymorphic malware detection and identification via context-free grammar homomorphism," Bell Labs Technical Journal, vol. 12, pp. 139-147, 2007.
  • X. Hu, T. Chiueh, and K. G. Shin, "Large-Scale Malware Indexing Using Function-Call Graphs," in Computer and Communications Security , Chicago, Illinois, USA, pp. 611-620.
  • Henchiri. O, Japkowicz. N (2006), ?A Feature Selection and Evaluation Scheme for Computer Virus Detection , Data Mining, ICDM '06. Sixth International Conference on Digital Object Identifier: 10. 1109/ICDM. 2006. 4 Publication Year: 2006 , Page(s): 891 – 895
  • Moskovitch. R, Feher. C, Tzachar. N, Berger. E, Gitelman. M, Dolev. S, and Elovici. Y (2008) ?Unknown Malcode Detection Using OPCODE Representation , ISI 2008, June 17-20, Taipei, Taiwan.
  • Bozagac. C. D, ?Application of Data Mining based Malicious Code Detection Techniques for Detecting new Spyware , White paper, Bilkent University 2005.
  • J. Kinable and O. Kostakis, "Malware classification based on call graph clustering," Journal in Computer Virology, vol. 7, pp. 233-245, 2011.
  • Moskovitch. R, Feher. C, Tzachar. N, Berger. E, Gitelman. M, Dolev. S, and Elovici. Y (2008) ?Unknown Malcode Detection Using OPCODE Representation , ISI 2008, June 17-20, Taipei, Taiwan.