Given the use of machine learning-based tools for monitoring the Water Quality Indicators (WQIs) over lakes and coastal waters, understanding the properties of such models, including the uncertainties inherent in their predictions is essential. This has led to the development of two probabilistic NN-algorithms: Mixture Density Network (MDN) and Bayesian Neural Network via Monte Carlo Dropout (BNN-MCD). These NNs are complex, featuring thousands of trainable parameters and modifiable hyper-parameters, and have been independently trained and tested. The model uncertainty metric captures the uncertainty present in each prediction based on the properties of the model—namely, the model architecture and the training data distribution. We conduct an analysis of MDN and BNN-MCD under near-identical conditions of model architecture, training, and test sets, etc., to retrieve the concentration of chlorophyll-a pigments (Chl a), total suspended solids (TSS), and the absorption by colored dissolved organic matter at 440 nm (acdom (440)). The spectral resolutions considered correspond to the Hyperspectral Imager for the Coastal Ocean (HICO), PRecursore IperSpettrale della Missione Applicativa (PRISMA), Ocean Colour and Land Imager (OLCI), and MultiSpectral Instrument (MSI). The model performances are tested in terms of both predictive residuals and predictive uncertainty metric quality. We also compared the simultaneous WQI retrievals against a single-parameter retrieval framework (for Chla). Ultimately, the models’ real-world applicability was investigated using a MSI satellite-matchup dataset (N = 3'053) of Chla and TSS. Experiments show that both models exhibit comparable estimation performance. Specifically, the median symmetric accuracy (MdSA) on the test set for the different parameters in both algorithms range from 30% to 60%. The uncertainty estimates, on the other hand, differ strongly. MDN’s uncertainty estimate is ∼50%, encompassing estimation residuals for 75% of test samples, whereas BNN-MCD’s average uncertainty estimate is ∼25%, encompassing the residuals for 50% of samples. Our analysis also revealed that simultaneous estimation results in improvements in both predictive performance and uncertainty metric quality. Interestingly, the trends mentioned above hold across different sensor resolutions, as well as experimental regimes. This disparity calls for additional research to determine whether such trends in model uncertainty are inherent to specific models or can be more broadly generalized across different algorithms and sensor setups.