Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNNEC methods

Show simple item record

dc.contributor.author Rahmati, Omid
dc.contributor.author Choubin, Bahram
dc.contributor.author Fathabadi, Abolhasan
dc.contributor.author Coulon, Frederic
dc.contributor.author Soltani, Elinaz
dc.contributor.author Shahabi, Himan
dc.contributor.author Mollaefar, Eisa
dc.contributor.author Tiefenbacher, John
dc.contributor.author Cipullo, Sabrina
dc.contributor.author Bin Ahmad, Baharin
dc.contributor.author Tien Bui, Dieu
dc.date.accessioned 2019-07-08T07:55:19Z
dc.date.available 2019-07-08T07:55:19Z
dc.date.issued 2019-06-21
dc.identifier.citation Rahmati O, Choubin B, Fathabadi A, et al., (2019) Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNNEC methods. Science of the Total Environment, Volume 688, October 2019, pp. 855-866 en_UK
dc.identifier.issn 0048-9697
dc.identifier.uri https://doi.org/10.1016/j.scitotenv.2019.06.320
dc.identifier.uri http://dspace.lib.cranfield.ac.uk/handle/1826/14305
dc.description.abstract Although estimating the uncertainty of models used for modelling nitrate contamination of groundwater is essential in groundwater management, it has been generally ignored. This issue motivates this research to explore the predictive uncertainty of machine-learning (ML) models in this field of study using two different residuals uncertainty methods: quantile regression (QR) and uncertainty estimation based on local errors and clustering (UNEEC). Prediction-interval coverage probability (PICP), the most important of the statistical measures of uncertainty, was used to evaluate uncertainty. Additionally, three state-of-the-art ML models including support vector machine (SVM), random forest (RF), and k-nearest neighbor (kNN) were selected to spatially model groundwater nitrate concentrations. The models were calibrated with nitrate concentrations from 80 wells (70% of the data) and then validated with nitrate concentrations from 34 wells (30% of the data). Both uncertainty and predictive performance criteria should be considered when comparing and selecting the best model. Results highlight that the kNN model is the best model because not only did it have the lowest uncertainty based on the PICP statistic in both the QR (0.94) and the UNEEC (in all clusters, 0.85–0.91) methods, but it also had predictive performance statistics (RMSE = 10.63, R2 = 0.71) that were relatively similar to RF (RMSE = 10.41, R2 = 0.72) and higher than SVM (RMSE = 13.28, R2 = 0.58). Determining the uncertainty of ML models used for spatially modelling groundwater-nitrate pollution enables managers to achieve better risk-based decision making and consequently increases the reliability and credibility of groundwater-nitrate predictions. en_UK
dc.language.iso en en_UK
dc.publisher Elsevier en_UK
dc.rights Attribution-NonCommercial-NoDerivatives 4.0 International *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/4.0/ *
dc.subject Groundwater pollution en_UK
dc.subject Uncertainty assessment en_UK
dc.subject Nitrate concentration en_UK
dc.subject Machine learning en_UK
dc.subject GIS en_UK
dc.title Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNNEC methods en_UK
dc.type Article


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International

Search CERES


Browse

My Account

Statistics