Machine learning (ML) combined with hyperspectral reflectance is a hot research topic in the field of vegetation traits monitoring. However, due to the limited experimental data used to calibrate ML algorithms, ML model generalization ability needs to be urgently improved. This study aimed to improve the generalization ability of random forest (RF) for monitoring potato chlorophyll content by integrating experimental and simulated data. The calibration dataset consists of experimental and PROSAIL model simulated data. The RF model was validated using farmers’ fields data. The results showed that the RF model calibrated by the integrated experimental and simulated data significantly improved the explanatory power of potato chlorophyll content by 23-47% compared with the RF model calibrated with simulated or experimental data alone, and the R2 and RMSE were 0.67 and 0.08 g/m2. The results indicated that integrating the PROSAIL model simulated and experiment measured data could effectively compensate for the large amount of data required by ML when calibrating the model, thereby improving the generalization ability of the model.
H. Yang, F. Li, & K. Yu (2023). Improving the generalization ability of random forest for potato chlorophyll estimation through integrating experimental and simulated canopy spectral data. Precision agriculture ‘23: 507–512.