Enhancing Riverine Water Quality Prediction: The Application of Variational Autoencoders for Robust Data Augmentation in Environmental Science

Gregorius Airlangga(1*),

(1) Universitas Katolik Indonesia Atma Jaya, Indonesia
(*) Corresponding Author

Abstract


In this study, we present a comprehensive approach to address a critical challenge in environmental science: the accurate prediction of dissolved oxygen (DO) levels in river ecosystems. Leveraging advanced machine learning techniques, particularly Variational Autoencoders (VAEs), our research aims to overcome the limitations posed by sparse and incomplete environmental datasets. We meticulously curated a dataset from multiple water monitoring stations, capturing key indicators such as DO, ammonium ions, nitrites, nitrates, and biochemical oxygen demand. Following data standardization and quality assessment, we implemented a RandomForestRegressor to ascertain feature importance, utilizing GridSearchCV and RandomizedSearchCV for model optimization. This allowed for precise feature selection to inform the predictive model. Anomaly detection was performed using One-Class SVM and Isolation Forest methodologies, essential for purifying the dataset by removing outliers. Subsequently, VAEs were applied to augment the data, synthesizing new data points that were statistically coherent with the original set, thus enriching the dataset and potentially unveiling concealed patterns. The augmented data's impact was evaluated through a RandomForestRegressor model, comparing RMSE scores before and after data augmentation, revealing a notable improvement in predictive accuracy with the lowest RMSE observed for the model utilizing VAE-generated data. This underscores the VAE's value in enhancing the model's performance, indicating that the synthetic data provided additional variability and complexity that aided the model's learning process. Our findings indicate that integrating sophisticated data augmentation techniques like VAEs can significantly enhance the quality of environmental datasets and the accuracy of predictive models.

Full Text:

PDF

References


Y. Huang et al., “Forward-looking roadmaps for long-term continuous water quality monitoring: bottlenecks, innovations, and prospects in a critical review,” Environ. Sci. & Technol., vol. 56, no. 9, pp. 5334–5354, 2022.

S. Khullar and N. Singh, “Water quality assessment of a river using deep learning Bi-LSTM methodology: forecasting and validation,” Environ. Sci. Pollut. Res., vol. 29, no. 9, pp. 12875–12889, 2022.

L. M. Kuehne et al., “The future of global river health monitoring,” PLOS Water, vol. 2, no. 9, p. e0000101, 2023.

C. Xu, X. Chen, and L. Zhang, “Predicting river dissolved oxygen time series based on stand-alone models and hybrid wavelet-based models,” J. Environ. Manage., vol. 295, p. 113085, 2021.

Y. Liu, Q. Zhang, L. Song, and Y. Chen, “Attention-based recurrent neural networks for accurate short-term and long-term dissolved oxygen prediction,” Comput. Electron. Agric., vol. 165, p. 104964, 2019.

H. Guo et al., “A generalized machine learning approach for dissolved oxygen estimation at multiple spatiotemporal scales using remote sensing,” Environ. Pollut., vol. 288, p. 117734, 2021.

S. Luo et al., “FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems,” arXiv Prepr. arXiv2311.10255, 2023.

H. Li, C. Qin, W. He, F. Sun, and P. Du, “Improved predictive performance of cyanobacterial blooms using a hybrid statistical and deep-learning method,” Environ. Res. Lett., vol. 16, no. 12, p. 124045, 2021.

M. Sonnewald, R. Lguensat, D. C. Jones, P. D. Dueben, J. Brajard, and V. Balaji, “Bridging observations, theory and numerical simulation of the ocean using machine learning,” Environ. Res. Lett., vol. 16, no. 7, p. 73008, 2021.

S. Zhong et al., “Machine learning: new ideas and tools in environmental science and engineering,” Environ. Sci. & Technol., vol. 55, no. 19, pp. 12741–12754, 2021.

D. Iskandaryan, F. Ramos, and S. Trilles, “Air quality prediction in smart cities using machine learning technologies based on sensor data: a review,” Appl. Sci., vol. 10, no. 7, p. 2401, 2020.

W. Wei, O. Ramalho, L. Malingre, S. Sivanantham, J. C. Little, and C. Mandin, “Machine learning and statistical models for predicting indoor air quality,” Indoor Air, vol. 29, no. 5, pp. 704–726, 2019.

W. Wu, R. Emerton, Q. Duan, A. W. Wood, F. Wetterhall, and D. E. Robertson, “Ensemble flood forecasting: Current status and future opportunities,” Wiley Interdiscip. Rev. Water, vol. 7, no. 3, p. e1432, 2020.

A. J. Lopatkin and J. J. Collins, “Predictive biology: modelling, understanding and harnessing microbial complexity,” Nat. Rev. Microbiol., vol. 18, no. 9, pp. 507–520, 2020.

F. Couvreux et al., “Process-based climate model development harnessing machine learning: I. A calibration tool for parameterization improvement,” J. Adv. Model. Earth Syst., vol. 13, no. 3, p. e2020MS002217, 2021.

J. Houston, F. G. Glavin, and M. G. Madden, “Robust classification of high-dimensional spectroscopy data using deep learning and data synthesis,” J. Chem. Inf. Model., vol. 60, no. 4, pp. 1936–1954, 2020.

R. Tang, “Some advances in Bayesian inference and generative modeling,” University of Illinois at Urbana-Champaign, 2023.

T. Glüsenkamp, “Unifying supervised learning and VAEs--automating statistical inference in (astro-) particle physics with amortized conditional normalizing flows,” arXiv Prepr. arXiv2008.05825, 2020.

A. De Vos, R. Biggs, and R. Preiser, “Methods for understanding social-ecological systems: a review of place-based studies,” Ecol. Soc., vol. 24, no. 4, 2019.




DOI: https://doi.org/10.30645/kesatria.v5i1.328

DOI (PDF): https://doi.org/10.30645/kesatria.v5i1.328.g325

Refbacks

  • There are currently no refbacks.


Published Papers Indexed/Abstracted By: