Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNNEC methods

dc.contributor.authorRahmati, Omid
dc.contributor.authorChoubin, Bahram
dc.contributor.authorFathabadi, Abolhasan
dc.contributor.authorCoulon, Frederic
dc.contributor.authorSoltani, Elinaz
dc.contributor.authorShahabi, Himan
dc.contributor.authorMollaefar, Eisa
dc.contributor.authorTiefenbacher, John
dc.contributor.authorCipullo, Sabrina
dc.contributor.authorBin Ahmad, Baharin
dc.contributor.authorTien Bui, Dieu
dc.date.accessioned2019-07-08T07:55:19Z
dc.date.available2019-07-08T07:55:19Z
dc.date.issued2019-06-21
dc.description.abstractAlthough estimating the uncertainty of models used for modelling nitrate contamination of groundwater is essential in groundwater management, it has been generally ignored. This issue motivates this research to explore the predictive uncertainty of machine-learning (ML) models in this field of study using two different residuals uncertainty methods: quantile regression (QR) and uncertainty estimation based on local errors and clustering (UNEEC). Prediction-interval coverage probability (PICP), the most important of the statistical measures of uncertainty, was used to evaluate uncertainty. Additionally, three state-of-the-art ML models including support vector machine (SVM), random forest (RF), and k-nearest neighbor (kNN) were selected to spatially model groundwater nitrate concentrations. The models were calibrated with nitrate concentrations from 80 wells (70% of the data) and then validated with nitrate concentrations from 34 wells (30% of the data). Both uncertainty and predictive performance criteria should be considered when comparing and selecting the best model. Results highlight that the kNN model is the best model because not only did it have the lowest uncertainty based on the PICP statistic in both the QR (0.94) and the UNEEC (in all clusters, 0.85–0.91) methods, but it also had predictive performance statistics (RMSE = 10.63, R2 = 0.71) that were relatively similar to RF (RMSE = 10.41, R2 = 0.72) and higher than SVM (RMSE = 13.28, R2 = 0.58). Determining the uncertainty of ML models used for spatially modelling groundwater-nitrate pollution enables managers to achieve better risk-based decision making and consequently increases the reliability and credibility of groundwater-nitrate predictions.en_UK
dc.identifier.citationRahmati O, Choubin B, Fathabadi A, et al., (2019) Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNNEC methods. Science of the Total Environment, Volume 688, October 2019, pp. 855-866en_UK
dc.identifier.issn0048-9697
dc.identifier.urihttps://doi.org/10.1016/j.scitotenv.2019.06.320
dc.identifier.urihttp://dspace.lib.cranfield.ac.uk/handle/1826/14305
dc.language.isoenen_UK
dc.publisherElsevieren_UK
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectGroundwater pollutionen_UK
dc.subjectUncertainty assessmenten_UK
dc.subjectNitrate concentrationen_UK
dc.subjectMachine learningen_UK
dc.subjectGISen_UK
dc.titlePredicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNNEC methodsen_UK
dc.typeArticle

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Predicting_uncertainty_of_machine_learning_models-2019.pdf
Size:
1.97 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.63 KB
Format:
Item-specific license agreed upon to submission
Description: