Species distribution models are commonly applied to predict species responses to environmental conditions. A wide variety of models with different properties exist that vary in complexity, which affects their predictive performance and interpretability. Machine learning algorithms are increasingly used because they are capable to capture complex relationships and are often better in prediction. However, to inform environmental management, it is important that a model predicts well for the right reasons. It remains a challenge to select a model with a reasonable level of complexity that captures the true relationship between the response and explanatory variables as good as possible rather than fitting to the noise in the data.
In this study we ask: 1) how much predictive performance can we gain by using increasingly complex models, 2) how does model complexity affect the degree of overfitting, and 3) do the inferred responses differ among models and what can we learn from them? To address these questions, we applied eight models with different complexity to predict the probability of occurrence of freshwater macroinvertebrate taxa based on 2729 Swiss monitoring samples. We compared the models in terms of predictive performance during cross-validation and for generalization out of the calibration domain ("extrapolation" or transferability). We applied model agnostic tools to shed light on model interpretability.
Contrary to our expectation, all models predicted similarly well during cross-validation, while no model predicted better than the null model during out-of-domain generalization on average over all taxa. Performance was best for taxa with intermediate prevalence. More complex models predicted slightly better than standard statistical models but were prone to overfitting.
Overfitting indicates that a model describes not only the signal in the data but also part of the noise. This impedes the interpretation of response shapes learned by the model, because one cannot distinguish the signal from the noise. Furthermore, the strongly overfitting models learned irregular relationships and strong interactions that are ecologically not plausible. Thus, in this study, the minor gain in predictive performance from more complex models was outweighed by the overfitting.
Ecological field data that is used as model input or for calibration is typically prone to different sources of variability, from sampling, the measurement process and stochasticity. We therefore call for caution when using complex data-driven models to learn about species responses or to inform environmental management. In such cases, we recommend to compare a range of models regarding their predictive performance, overfitting and response shapes to better understand the robustness of inferred responses.