Influence of the number of samples on prediction error of visible and near infrared spectroscopy of selected soil properties at the farm scale.
Date published
Free to read from
Authors
Supervisor/s
Journal Title
Journal ISSN
Volume Title
Publisher
Department
Type
ISSN
Format
Citation
Abstract
Although visible and near infrared (vis-NIR) spectroscopy has proved to be a fast, inexpensive and relatively accurate tool to measure soil properties, considerable research is required to optimise the calibration procedure and establish robust calibration models. This paper reports on the influence of the number of samples used for the development of farm-scale calibration models for moisture content (MC), total nitrogen (TN) and organic carbon (OC) on the prediction error expressed as root mean square error of prediction (RMSEP). Fresh (wet) soil samples collected from four farms in Czech Republic, Germany, Denmark and the UK were scanned with a fibre type vis-NIR, AgroSpec spectrophotometer (tec5 Technology for Spectroscopy, Germany) with a spectral range of 305 - 2200 nm. Spectra were divided into calibration (two-third) and prediction (one-third) sets and the calibration spectra were subjected to a partial least squares regression (PLSR) with leave-one-out cross validation using Unscrambler 7.8 software (Camo Inc., Oslo, Norway). The RMSEP values of models with large sample number (46 - 84 samples from each farm) were compared with those of models developed using small sample number (25 samples selected from the large sample set of each farm) for the same variation range. Both large set and small set models were validated by the same prediction set for each property. Further PLSR analysis was carried out on samples from the German farm, with different sample number of the calibration set of 25, 50, 75 and 100 samples. Results showed that the large-size dataset models resulted in lower RMSEP values than the small-size dataset models for all the soil properties studied. The results also demonstrated that with the increase in sample number used in the calibration set, RMSEP decreased in almost linear fashion, although the largest decrease was between 25 and 50 samples. Therefore, it is recommended to chose the number of samples according to accuracy required, although 50 soil samples is considered appropriate in this study to establish calibration models of TN, OC and MC with smaller expected prediction errors as compared with smaller sample numbers.