Influence of the number of samples on prediction error of visible and near infrared spectroscopy of selected soil properties at the farm scale.

Date published

2013-01-23

Free to read from

Supervisor/s

Journal Title

Journal ISSN

Volume Title

Publisher

Blackwell Publishing Ltd

Department

Type

Article

ISSN

1351-0754

Format

Citation

Boyan Kuang and Abdul Mounem Mouazen. Influence of the number of samples on prediction error of visible and near infrared spectroscopy of selected soil properties at the farm scale. European Journal of Soil Science, 2012, volume 63, Issue 3, pp421-429

Abstract

Although visible and near infrared (vis-NIR) spectroscopy has proved to be a fast, inexpensive and relatively accurate tool to measure soil properties, considerable research is required to optimise the calibration procedure and establish robust calibration models. This paper reports on the influence of the number of samples used for the development of farm-scale calibration models for moisture content (MC), total nitrogen (TN) and organic carbon (OC) on the prediction error expressed as root mean square error of prediction (RMSEP). Fresh (wet) soil samples collected from four farms in Czech Republic, Germany, Denmark and the UK were scanned with a fibre type vis-NIR, AgroSpec spectrophotometer (tec5 Technology for Spectroscopy, Germany) with a spectral range of 305 - 2200 nm. Spectra were divided into calibration (two-third) and prediction (one-third) sets and the calibration spectra were subjected to a partial least squares regression (PLSR) with leave-one-out cross validation using Unscrambler 7.8 software (Camo Inc., Oslo, Norway). The RMSEP values of models with large sample number (46 - 84 samples from each farm) were compared with those of models developed using small sample number (25 samples selected from the large sample set of each farm) for the same variation range. Both large set and small set models were validated by the same prediction set for each property. Further PLSR analysis was carried out on samples from the German farm, with different sample number of the calibration set of 25, 50, 75 and 100 samples. Results showed that the large-size dataset models resulted in lower RMSEP values than the small-size dataset models for all the soil properties studied. The results also demonstrated that with the increase in sample number used in the calibration set, RMSEP decreased in almost linear fashion, although the largest decrease was between 25 and 50 samples. Therefore, it is recommended to chose the number of samples according to accuracy required, although 50 soil samples is considered appropriate in this study to establish calibration models of TN, OC and MC with smaller expected prediction errors as compared with smaller sample numbers.

Description

Software Description

Software Language

Github

Keywords

DOI

Rights

The definitive version is available at www3.interscience.wiley.com

Relationships

Relationships

Supplements

Funder/s