CFP last date
22 April 2024
Reseach Article

Anomaly Detection for Raw Water Quality – A Comparative Analysis of the Local Outlier Factor Algorithm and the Random Forest Algorithms

by Nahshon Mokua, Ciira Wa Maina, Henry Kiragu
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 174 - Number 26
Year of Publication: 2021
Authors: Nahshon Mokua, Ciira Wa Maina, Henry Kiragu
10.5120/ijca2021921196

Nahshon Mokua, Ciira Wa Maina, Henry Kiragu . Anomaly Detection for Raw Water Quality – A Comparative Analysis of the Local Outlier Factor Algorithm and the Random Forest Algorithms. International Journal of Computer Applications. 174, 26 ( Mar 2021), 47-54. DOI=10.5120/ijca2021921196

@article{ 10.5120/ijca2021921196,
author = { Nahshon Mokua, Ciira Wa Maina, Henry Kiragu },
title = { Anomaly Detection for Raw Water Quality – A Comparative Analysis of the Local Outlier Factor Algorithm and the Random Forest Algorithms },
journal = { International Journal of Computer Applications },
issue_date = { Mar 2021 },
volume = { 174 },
number = { 26 },
month = { Mar },
year = { 2021 },
issn = { 0975-8887 },
pages = { 47-54 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume174/number26/31842-2021921196/ },
doi = { 10.5120/ijca2021921196 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:23:12.426134+05:30
%A Nahshon Mokua
%A Ciira Wa Maina
%A Henry Kiragu
%T Anomaly Detection for Raw Water Quality – A Comparative Analysis of the Local Outlier Factor Algorithm and the Random Forest Algorithms
%J International Journal of Computer Applications
%@ 0975-8887
%V 174
%N 26
%P 47-54
%D 2021
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The increased use of real-time water quality monitoring using automated systems with sensors demands and makes it possible to identify unexpected values in time. Anomalies are brought by technical issues that are likely to prevent detection of problematic data manually at the incoming data rate. Use of machine learning approaches to detect anomalies for water quality data is the main focus of this article. There is analysis of four time series machine learning anomaly detection techniques: the local outlier factor, the isolation forest, the extended isolation forest and robust random cut forest. A subset data collected from deployment of sensors in a water treatment plant (Nyeri-Kenya) was used to carry out extensive analysis of experiments of the afore-mentioned techniques; for turbidity and pH parameters. There was successful correct detection of all outliers for both subsets by the local outlier factor algorithm, contrary to the rest of the other algorithms considered. As per the primary experiment, the local outlier factor emerged the fastest. Also, it was easier use as long as there was selection of optimum parameters. Moreover, analysis of the four techniques demonstrated that with or without training, it is a powerful tool for water quality anomaly detection and hence a feasible approach.

References
  1. S. Jyoti, Y. Priyanka, K. Ashok and M. PalVishal, "Water Pollutants: Origin and Status," Sensors in Water Pollutants Monitoring: Role of Material, pp. 5-20, 2020.
  2. I. Joshua and G. A. Adewale, "A comprehensive review of water quality monitoring and assessment in Nigeria," Chemosphere, vol. 260, p. 127569, 2020.
  3. B. Jamie and B. Richard, Water quality monitoring: a practical guide to the design and implementation of freshwater quality studies and monitoring programmes, 1996.
  4. F. E. Grubbs, "Procedures for Detecting Outlying Observations in Samples," Technometrics, vol. 11, no. 1, 2012.
  5. M. Nahshon, W. M. Ciira and K. Henry, "A Raw Water Quality Monitoring System using Wireless Sensor Networks," International Journal of Computer Applications, vol. 174, no. 21, pp. 35-42, 2021.
  6. L. Alexander and A. Subutai, "Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark," in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), Miami, 2015.
  7. Q. Elena, C. Francesco, G. Giulio Di and P. Riccardo, "Machine learning for anomaly detection and process phase classification to improve safety and maintenance activities," Journal of Manufacturing Systems, vol. 56, pp. 117-132, 2020.
  8. L. Jie, W. Peng, J. Dexun, N. Jun and Z. Weiyu, "An integrated data-driven framework for surface water quality anomaly detection and early warning," Journal of Cleaner Production, vol. 251, 2020.
  9. M. M. Breunig, H.-P. Kriegel, R. T. Ng and J. Sander, "LOF: Identifying Density-Based Local Outliers," Association for Computing Machinery, p. 93–104, 2000.
  10. T. L. Fei, M. T. Kai and Z. Zhi-Hua, "Isolation-based Anomaly Detection," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 6, pp. 1-39, 2012.
  11. H. Sahand, C. K. Matias and R. Brunner, "Extended Isolation Forest," IEEE Transactions on Knowledge and Data Engineering, pp. 1 - 1, 2019.
  12. S. Guha, N. Mishra, G. Roy and O. Schrijvers, "Robust random cut forest based anomaly detection on streams," International conference on machine learning, pp. 2712-2721, 2016.
Index Terms

Computer Science
Information Sciences

Keywords

Water quality monitoring anomaly detection local outlier factor isolation forest extended isolation forest robust random cut forest.