CFP last date
20 June 2024
Reseach Article

A Novel Cleansing Method for Random-Walk Data using Extended Multivariate Nonlinear Regression: A Data Preprocessor for Load Forecasting Mechanism

by Hussein Bakiri, Hamisi Ndyetabura, Libe Massawe, Hellen Maziku
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 183 - Number 16
Year of Publication: 2021
Authors: Hussein Bakiri, Hamisi Ndyetabura, Libe Massawe, Hellen Maziku

Hussein Bakiri, Hamisi Ndyetabura, Libe Massawe, Hellen Maziku . A Novel Cleansing Method for Random-Walk Data using Extended Multivariate Nonlinear Regression: A Data Preprocessor for Load Forecasting Mechanism. International Journal of Computer Applications. 183, 16 ( Jul 2021), 49-57. DOI=10.5120/ijca2021921503

@article{ 10.5120/ijca2021921503,
author = { Hussein Bakiri, Hamisi Ndyetabura, Libe Massawe, Hellen Maziku },
title = { A Novel Cleansing Method for Random-Walk Data using Extended Multivariate Nonlinear Regression: A Data Preprocessor for Load Forecasting Mechanism },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2021 },
volume = { 183 },
number = { 16 },
month = { Jul },
year = { 2021 },
issn = { 0975-8887 },
pages = { 49-57 },
numpages = {9},
url = { },
doi = { 10.5120/ijca2021921503 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-07T01:17:01.888123+05:30
%A Hussein Bakiri
%A Hamisi Ndyetabura
%A Libe Massawe
%A Hellen Maziku
%T A Novel Cleansing Method for Random-Walk Data using Extended Multivariate Nonlinear Regression: A Data Preprocessor for Load Forecasting Mechanism
%J International Journal of Computer Applications
%@ 0975-8887
%V 183
%N 16
%P 49-57
%D 2021
%I Foundation of Computer Science (FCS), NY, USA

The efficiency of any load forecasting mechanism depends on the quality and distribution characteristics of the training data. Outliers and missing values are the primary concern, especially in developing countries’ load data. Several research works have proposed the models for the imputation process to deal with outliers before forecasting. However, the efficiency of these approaches is compromised when it comes to data that falls into a random-walk distribution. Thus, this study aims to develop an efficient data cleansing model that accounts for a random-walk distributionby extending the Multivariate Nonlinear Regression (MNLR) method. The k-mean algorithm is used to detect and analyze the size of an outlier in the data. Twenty-minutes interval load data from 2015 to 2019 collected at Kinondoni-North (at Mikocheni distribution network in Dar es salaam) is used in this study. After analyzing the data for outliers, the empirical results detect the presence of outliers by 5.17852% (which is 5207 out of 105192). Finally, the extended-MNLR (e-MNLR) modelachieves promising results over the ANN, SVM, Miss Forest, MICE, and KNN algorithms by attaining 2.109137, 1.956039, and 7.787976 values of RMSE, MAE, and MAPE, respectively.

  1. S. Saab, E. Badr, and G. Nasr, “Univariate modeling and forecasting of energy consumption: The case of electricity in Lebanon,” Energy, vol. 26, no. 1, pp. 1–14, 2001, doi: 10.1016/S0360-5442(00)00049-9.
  2. M. U. Fahad and N. Arbab, “Factor Affecting Short Term Load Forecasting,” J. Clean Energy Technol., vol. 2, no. 4, pp. 305–309, 2014, doi: 10.7763/jocet.2014.v2.145.
  3. H. Bakiri, H. Maziku, N. Mvungi, N. Hamisi, and M. Libe, “Towards the Establishment of Robust Load Forecasting Mechanism in Tanzania Grid : Effect of Air Temperature and Daytime on Electricity Consumption in Residential Buildings,” Int. J. Smart Grid, vol. 5, no. 1, pp. 24–36, 2021.
  4. J. Sim, J. S. Lee, and O. Kwon, “Missing values and optimal selection of an imputation method and classification algorithm to improve the accuracy of ubiquitous computing applications,” Math. Probl. Eng., vol. 2015, 2015, doi: 10.1155/2015/538613.
  5. S. van Buuren and K. Groothuis-Oudshoorn, “mice: Multivariate imputation by chained equations in R,” J. Stat. Softw., vol. 45, no. 3, pp. 1–67, 2011, doi: 10.18637/jss.v045.i03.
  6. J. Adamowski, H. Fung Chan, S. O. Prasher, B. Ozga-Zielinski, and A. Sliusarieva, “Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada,” Water Resour. Res., vol. 48, no. 1, pp. 1–14, 2012, doi: 10.1029/2010WR009945.
  7. A. Yasar, M. Bilgili, and E. Simsek, “Water Demand Forecasting Based on Stepwise Multiple Nonlinear Regression Analysis,” Arab. J. Sci. Eng., vol. 37, no. 8, pp. 2333–2341, 2012, doi: 10.1007/s13369-012-0309-z.
  8. T. T. Dang, H. Y. T. Ngan, and W. Liu, “Distance-based k-nearest neighbors outlier detection method in large-scale traffic data,” Int. Conf. Digit. Signal Process. DSP, vol. 2015-Septe, no. May 2016, pp. 507–510, 2015, doi: 10.1109/ICDSP.2015.7251924.
  9. D. J. Stekhoven and P. Bühlmann, “Missforest-Non-parametric missing value imputation for mixed-type data,” Bioinformatics, vol. 28, no. 1, pp. 112–118, 2012, doi: 10.1093/bioinformatics/btr597.
  10. H. P. Sajjad, A. Docherty, and Y. Tyshetskiy, “Efficient Representation Learning Using Random Walks for Dynamic Graphs,” 2019, [Online]. Available:
  11. S. C. Bhattacharyya and G. R. Timilsina, “Modelling energy demand of developing countries: Are the specific features adequately captured?,” Energy Policy, vol. 38, no. 4, pp. 1979–1990, 2010, doi: 10.1016/j.enpol.2009.11.079.
  12. J. Steinbuks, “Assessing the accuracy of electricity production forecasts in developing countries,” Int. J. Forecast., vol. 35, no. 3, pp. 1175–1185, 2019, doi: 10.1016/j.ijforecast.2019.04.009.
  13. K. Kavaklioglu, “Modeling and prediction of Turkey’s electricity consumption using Support Vector Regression,” Appl. Energy, vol. 88, no. 1, pp. 368–375, 2011, doi: 10.1016/j.apenergy.2010.07.021.
  14. L. K. Hotta, “Effect of Outliers on Forecasting Temporally Aggregated Flow Variables,” Soc. Estad~stica e I~vestigacidn Oper., vol. 13, no. 2, pp. 371–402, 2004.
  15. N. J. Williams, P. Jaramillo, B. Cornell, I. Lyons-Galante, and E. Wynn, “Load characteristics of East African microgrids,” Proc. - 2017 IEEE PES-IAS PowerAfrica Conf. Harnessing Energy, Inf. Commun. Technol. Afford. Electrif. Africa, PowerAfrica 2017, pp. 236–241, 2017, doi: 10.1109/PowerAfrica.2017.7991230.
  16. N. M. Odhiambo, “Energy consumption and economic growth nexus in Tanzania: An ARDL bounds testing approach,” Energy Policy, vol. 37, no. 2, pp. 617–622, 2009, doi: 10.1016/j.enpol.2008.09.077.
  17. F. Egelioglu, A. A. Mohamad, and H. Guven, “Economic variables and electricity consumption in Northern Cyprus,” Energy, vol. 26, no. 4, pp. 355–362, 2001, doi: 10.1016/S0360-5442(01)00008-1.
  18. G. Okoboi and J. Mawejje, “Electricity peak demand in Uganda: insights and foresight,” Energy. Sustain. Soc., vol. 6, no. 1, 2016, doi: 10.1186/s13705-016-0094-8.
  19. A. A. Aziz, N. H. Nik Mustapha, and R. Ismail, “Factors affecting energy demand in developing countries: A dynamic panel analysis,” Int. J. Energy Econ. Policy, vol. 3, no. SPECIAL ISSUE, pp. 1–6, 2013.
  20. M. Khanna and N. D. Rao, “Supply and Demand of Electricity in the Developing World,” Annu. Rev. Resour. Econ., vol. 1, no. 1, pp. 567–596, 2009, doi: 10.1146/annurev.resource.050708.144230.
  21. K. I. Penny, “Appropriate Critical Values When Testing for a Single Multivariate Outlier by Using the Mahalanobis Distance,” Appl. Stat., vol. 45, no. 1, p. 73, 1996, doi: 10.2307/2986224.
  22. K. J. Tvarlapati and K. A. Hoo, “A METHOD OF ROBUST MULTIVARIATE OUTLIER,” IFAC Proc. Vol., vol. 33, no. 10, pp. 641–646, 2000, doi: 10.1016/S1474-6670(17)38613-5.
  23. T. E. Raghunathan, J. M. Lepkowski, J. Van Hoewyk, and P. Solenberger, “A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models,” Stat. Canada, vol. 27, no. 1, pp. 85–95, 2001.
  24. S. Zhang and S. Member, “Parimputation : From Imputation and Null-Imputation to Partially Imputation,” IEEE Intell. Informatics Bullein, vol. 9, no. 1, pp. 32–38, 2008.
  25. J. Ma, G. Zhang, J. Lu, and D. Ruan, “Impute missing assessments by opinion clustering in multi-criteria group decision making problems,” 2009 Int. Fuzzy Syst. Assoc. World Congr. 2009 Eur. Soc. Fuzzy Log. Technol. Conf. IFSA-EUSFLAT 2009 - Proc., pp. 555–560, 2009.
  26. R. Rustum and A. Adeloye, “Replacing Outliers and Missing Values from Activated Sludge Data Using Kohonen Self-Organizing Map,” J. Environ. Eng., vol. 133, no. 9, pp. 909–916, 2007, doi: 10.1061/(ASCE)0733-9372(2007)133:9(909).
  27. L. Plazas-nossa and A. Torres, “Detection of outliers and replacement of missing values in absorbance and discharge time series,” in 10th International Urban Drainage Modelling Conference, 2015, pp. 113–117.
  28. D. E. N. Frossard, I. O. Nunes, and R. A. Krohling, “An approach to dealing with missing values in heterogeneous data using k-nearest neighbors,” 2016, [Online]. Available:
  29. G. Özbayočlu and M. Evren Özbayočlu, “A new approach for the prediction of ash fusion temperatures: A case study using Turkish lignites,” Fuel, vol. 85, no. 4, pp. 545–552, 2006, doi: 10.1016/j.fuel.2004.12.020.
  30. P. J. Teusnissen and G, “Nonlinear least squares,” Manuscripta Geod., vol. 15, no. 3, pp. 137–150, 1990.
Index Terms

Computer Science
Information Sciences


Load forecasting Developingcountries Outliers Data cleansing extended-MNLR