Speeding Up ML-based IDSs through Data Preprocessing Techniques

Lawrence Owusu; Ahmad Patooghy; Masud R. Rashel; Marwan Bikdash; Islam AKM Kamrul

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Speeding Up ML-based IDSs through Data Preprocessing Techniques

by Lawrence Owusu, Ahmad Patooghy, Masud R. Rashel, Marwan Bikdash, Islam AKM Kamrul

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Number 12

Year of Publication: 2025

Authors: Lawrence Owusu, Ahmad Patooghy, Masud R. Rashel, Marwan Bikdash, Islam AKM Kamrul

10.5120/ijca2025925071

Lawrence Owusu, Ahmad Patooghy, Masud R. Rashel, Marwan Bikdash, Islam AKM Kamrul . Speeding Up ML-based IDSs through Data Preprocessing Techniques. International Journal of Computer Applications. 187, 12 ( Jun 2025), 1-9. DOI=10.5120/ijca2025925071

@article{ 10.5120/ijca2025925071,

author = { Lawrence Owusu, Ahmad Patooghy, Masud R. Rashel, Marwan Bikdash, Islam AKM Kamrul },

title = { Speeding Up ML-based IDSs through Data Preprocessing Techniques },

journal = { International Journal of Computer Applications },

issue_date = { Jun 2025 },

volume = { 187 },

number = { 12 },

month = { Jun },

year = { 2025 },

issn = { 0975-8887 },

pages = { 1-9 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume187/number12/speeding-up-ml-based-idss-through-data-preprocessing-techniques/ },

doi = { 10.5120/ijca2025925071 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2025-06-21T01:56:52+05:30

%A Lawrence Owusu

%A Ahmad Patooghy

%A Masud R. Rashel

%A Marwan Bikdash

%A Islam AKM Kamrul

%T Speeding Up ML-based IDSs through Data Preprocessing Techniques

%J International Journal of Computer Applications

%@ 0975-8887

%V 187

%N 12

%P 1-9

%D 2025

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Most of the current ML-based IDSs models priotize detection accuracy over detection latency, which is critical for real-time detection and mitigation of cyber-attacks. The study evaluated the impact of Principal Component Analysis (PCA) on optimizing machine learning-based IDS using the UNR-IDD dataset. We comprehensively analyzed the performance of Support Vector Machine (SVM), Na¨ıve Bayes (NB), and Random Forest (RF) before and after PCA transformation. Experimental results show that PCA significantly reduced the detection latency for SVM and NB without compromising their performance. Specifically, NB + PCA and SVM + PCA achieved a whopping 99.52% and 49.9% reduction in detection latency respectively, making them viable low-latency solutions. However, the PCA transformation did not significantly impact the detection latency of the random forest model. The results demonstrate that NB + PCA is the most efficient and lightweight model for real-time network intrusion detection. These findings demonstrate that PCA is an effective preprocessing step to optimize ML-based IDS for real-time applications.

References

Emad E. Abdallah, Wafa’ Eleisah, and Ahmed Fawzi Otoom. Intrusion Detection Systems using Supervised Machine Learning Techniques: A survey. Procedia Computer Science, 201(C):205–212, 2022.
Zeeshan Ahmad, Adnan Shahid Khan, CheahWai Shiang, Johari Abdullah, and Farhan Ahmad. Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Transactions on Emerging Telecommunications Technologies, 32(1):1–29, 2021.
Nisha Ahuja, Gaurav Singal, Debajyoti Mukhopadhyay, and Neeraj Kumar. Automated DDOS attack detection in software defined networking. Journal of Network and Computer Applications, 187(November 2020):103108, 2021.
Mahmood A. Al-Shareeda and Selvakumar Manickam. Man-in-the-Middle Attacks in Mobile Ad Hoc Networks (MANETs): Analysis and Evaluation. Symmetry, 14(8), 2022.
Abdullah Alqahtani and Frederick T. Sheldon. A Survey of Crypto Ransomware Attack Detection Methodologies: An Evolving Outlook. Sensors, 22(5):1–19, 2022.
James P Anderson. Computer Security Technology Planning Study. Physical Review E, Volume I(ESD-TR-73-51):1–43, 1972.
James P Anderson. Computer And Security Journal Catalog, 1980.
Emil D. Attanasi and Timothy C. Coburn. Random Forest. pages 1182–1185, 2023.
Jasmin P Bharadiya. A Tutorial on Principal Component Analysis for Dimensionality Reduction in Machine Learning. International Journal of Innovative Research in Science Engineering and Technology, 8(5):2028–2032, 2023.
Fiona Carroll, John Ayooluwa Adejobi, and Reza Montasari. How Good Are We at Detecting a Phishing Attack? Investigating the Evolving Phishing Attack Email and Why It Continues to Successfully Deceive Society. SN Computer Science, 3(2):1–10, 2022.
Hong Chen, Songhua Hu, Rui Hua, and Xiuju Zhao. Improved naive Bayes classification algorithm for traffic risk management. Eurasip Journal on Advances in Signal Processing, 2021(1), 2021.
Surjeet Dalal, M. Poongodi, Umesh Kumar Lilhore, Fadl Dahan, Thavavel Vaiyapuri, Ismail Keshta, Sultan Mesfer Aldossary, Amena Mahmoud, and Sarita Simaiya. Optimized LightGBM model for security and privacy issues in cyberphysical systems. Transactions on Emerging Telecommunications Technologies, 34(6):1–18, 2023.
Tapadhir Das, Osama Abu Hamdan, Raj Mani Shukla, Shamik Sengupta, and Engin Arslan. UNR-IDD: Intrusion Detection Dataset using Network Port Statistics. Proceedings - IEEE Consumer Communications and Networking Conference, CCNC, 2023-Janua:497–500, 2023.
Dorothy E Denning and Peter G Neumann. Requirements and model for IDESa real-time intrusion detection expert system, 1985.
Ayesha S. Dina and D. Manivannan. Intrusion detection based on Machine Learning techniques in computer networks. Internet of Things (Netherlands), 16(August):100462, 2021.
Miguel Gonz´alez-Rodr´ıguez, Lorena Otero-Cerdeira, Encarnaci ´on Gonz´alez-Rufino, and Francisco Javier Rodr´ıguez- Mart´ınez. Study and evaluation of CPU scheduling algorithms. Heliyon, 10(9):e29959, 2024.
Rosita Guido, Stefania Ferrisi, Danilo Lofaro, and Domenico Conforti. An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review. Information (Switzerland), 15(4), 2024.
Oliver Gulyas and Gabor Kiss. Impact of cyber-Attacks on the financial institutions. Procedia Computer Science, 219:84– 90, 2023.
Yang Guo. A review of Machine Learning-based zero-day attack detection: Challenges and future directions. Computer Communications, 198(November 2022):175–185, 2023.
Md Alamgir Hossain and Md Saiful Islam. A novel hybrid feature selection and ensemble-based machine learning approach for botnet detection. Scientific Reports, 13(1):1–28, 2023.
Md Alamgir Hossain and Md Saiful Islam. Ensuring network security with a robust intrusion detection system using ensemble-based machine learning. Array, 19(May):100306, 2023.
Shujun Huang, C. A.I. Nianguang, Pedro Penzuti Pacheco, Shavira Narandes, Yang Wang, and X. U. Wayne. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics and Proteomics, 15(1):41–51, 2018.
Fayaz Itoo, Meenakshi, and Satwinder Singh. Comparison and analysis of logistic regression, Na¨ıve Bayes and KNN machine learning algorithms for credit card fraud detection. International Journal of Information Technology (Singapore), 13(4):1503–1511, 2021.
Yakub Kayode Saheed, Aremu Idris Abiodun, Sanjay Misra, Monica Kristiansen Holone, and Ricardo Colomo-Palacios. A machine learning-based intrusion detection for detecting internet of things network attacks. Alexandria Engineering Journal, 61(12):9395–9409, 2022.
K. Keerthi Vasan and B. Surendiran. Dimensionality reduction using Principal Component Analysis for network intrusion detection. Perspectives in Science, 8:510–512, 2016.
Deepshikha Kumari, Abhinav Sinha, Sandip Dutta, and Prashant Pranav. Optimizing neural networks using spider monkey optimization algorithm for intrusion detection system. Scientific Reports, 14(1):1–16, 2024.
Fatima Ezzahra Laghrissi, Samira Douzi, Khadija Douzi, and Badr Hssina. Intrusion detection systems using long shortterm memory (LSTM). Journal of Big Data, 8(1), 2021.
Xiao Xue Li, Dan Li, Wei Xin Ren, and Jun Shu Zhang. Loosening Identification of Multi-Bolt Connections Based on Wavelet Transform and ResNet-50 Convolutional Neural Network. Sensors, 22(18), 2022.
Batta Mahesh. Machine Learning Algorithms - A Review — Enhanced Reader. (October), 2019.
M. Manjula, Venkatesh, and K. R. Venugopal. Cyber Security Threats and Countermeasures using Machine and Deep Learning Approaches: A Survey. Journal of Computer Science, 19(1):20–56, 2023.
Vivek Narayanan, Ishan Arora, and Arjun Bhatia. Fast and accurate sentiment classification using an enhanced Naive Bayes model. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8206 LNCS:194–201, 2013.
P. J.Beslin Pajila, B. Gracelin Sheena, A. Gayathri, J. Aswini, M. Nalini, and R. Siva Subramanian. A Comprehensive Survey on Naive Bayes Algorithm: Advantages, Limitations and Applications. Proceedings of the 4th International Conference on Smart Electronics and Communication, ICOSEC 2023, pages 1228–1234, 2023.
Jeonghoon Park, Jinsu Kim, B. B. Gupta, and Namje Park. Network Log-Based SSH Brute-Force Attack Detection- Model. Computers, Materials and Continua, 68(1):887–901, 2021.
Fakhra Parveen, Sajid Iqbal, Gohar Mumtaz, and Muqaddas Salahuddin. Real-Time Intrusion Detection with Deep Learning : Analyzing the UNR Intrusion Detection Dataset. 07(02), 2024.
Jawad Rasheed, Alaa Ali Hameed, Chawki Djeddi, Akhtar Jamil, and Fadi Al-Turjman. A machine learning-based framework for diagnosis of COVID-19 from chest X-ray images. Interdisciplinary Sciences – Computational Life Sciences, 13(1):103–117, 2021.
Bipraneel Roy and Hon Cheung. A Deep Learning Approach for Intrusion Detection in Internet of Things using Bi- Directional Long Short-Term Memory Recurrent Neural Network. 2018 28th International Telecommunication Networks and Applications Conference, ITNAC 2018, pages 1–6, 2018.
Nema Salem and Sahar Hussein. Data dimensional reduction and principal components analysis. Procedia Computer Science, 163:292–299, 2019.
Jitendra Kumar Samriya, Surendra Kumar, Mohit Kumar, Huaming Wu, and Sukhpal Singh Gill. Machine Learning Based Network Intrusion Detection Optimization for Cloud Computing Environments. IEEE Transactions on Consumer Electronics, PP(Xx):1, 2024.
K. Saravanan, R. Banu Prakash, C. Balakrishnan, Gade Venkata Prasanna Kumar, R. Siva Subramanian, and M. Anita. Support Vector Machines: Unveiling the Power and Versatility of SVMs in Modern Machine Learning. 3rd International Conference on Innovative Mechanisms for Industry Applications, ICIMIA 2023 - Proceedings, (Icimia):680–687, 2023.
Serhack. How to Measure Execution Time of a Program - Ser- Hack.
Sugandh Seth, Gurvinder Singh, and Kuljit Kaur Chahal. A novel time efficient learning-based approach for smart intrusion detection system. Journal of Big Data, 8(1), 2021.
Mirko Stojˇci´c, Milorad K. Banjanin, Milan Vasiljevi´c, Aleksandar Stjepanovic´, and Zoran C´ urguz. PCA modeling of extraction and selection of variables influencing LTE network delay in urban mobility conditions. pages 117–125, 2023.
Bhattacharya Sweta, Rama Krishnan S. Siva, Kumar Maddikunta Praveen, Kaluri Rajesh, Singh Saurabh, Reddy Gadekallu Thippa, Alazab Mamoun, and Usman Tariq. A Novel PCA-Firefly Based XGBoost Classification Model for Intrusion Detection in Networks. Electronics (Switzerland), 9(2):219, 2020.
Abdulrahman Takiddin, Muhammad Ismail, Mahmoud Nabil, Mohamed M. E. A. Mahmoud, and Erchin Serpedin. Detecting Electricity Theft Cyber-Attacks in AMI Networks Using Deep Vector Embeddings. IEEE Systems Journal, 15(3):4189–4198, 2020.
Hatice Beyza Tas¸c¸ı, Serkan G¨onen, Mehmet Ali Barıs¸kan, G¨okc¸e Karacayılmaz, Birkan Alhan, and Ercan Nurcan Yılmaz. Password Attack Analysis Over Honeypot Using Machine Learning Password Attack Analysis. Turkish Journal of Mathematics and Computer Science, 13(2):388–402, 2021.
R. Vinayakumar, Mamoun Alazab, K. P. Soman, Prabaharan Poornachandran, Ameer Al-Nemrat, and Sitalakshmi Venkatraman. Deep Learning Approach for Intelligent Intrusion Detection System. IEEE Access, 7:41525–41550, 2019.
Xiaojuan Wang, Yun Zhong, Lei Jin, and Yabo Xiao. Scale Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition. Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 55(3):306–312, 2022.
Zhen Yang, Xiaodong Liu, Tong Li, Di Wu, Jinjiang Wang, Yunwei Zhao, and Han Han. A systematic literature review of methods and datasets for anomaly-based network intrusion detection. Computers and Security, 116, 2022.
Huanhuan Yuan, Yuanqing Xia, Yuan Yuan, and Hongjiu Yang. Resilient strategy design for cyber-physical system under active eavesdropping attack. Journal of the Franklin Institute, 358(10):5281–5304, 2021.
Huanhuan Yuan, Yuanqing Xia, Yuan Yuan, and Hongjiu Yang. Resilient strategy design for cyber-physical system under active eavesdropping attack. Journal of the Franklin Institute, 358(10):5281–5304, 2021.
Diyar Qader Zeebaree, Habibollah Haron, Adnan Mohsin Abdulazeez, and Dilovan Asaad Zebari. Trainable Model Based on New Uniform LBP Feature to Identify the Risk of the Breast Cancer. 2019 International Conference on Advanced Science and Engineering, ICOASE 2019, pages 106–111, 2019.
Changming Zhu and Daqi Gao. Influence of data preprocessing. Journal of Computing Science and Engineering, 10(2):51–57, 2016.

Index Terms

Computer Science

Information Sciences

Keywords

Intrusion detection principal component analysis latency data and network security