A CMIP5 Model Selection Specific to South Africa’s Winter Rainfall Zone

The 2015-2017 'Day Zero' drought in South Africa's WRZ highlighted the need for robust and reliable future climate projections for future water supply and agricultural planning in the region. The large uncertainties within these future climate projections are however a significant impediment to decision making. This study has investigated the potential to sub-sample a multi-model ensemble of climate projections with the aim to reduce uncertainty as well as identify a realistic subset of plausible future climate pathways in the region. A model selection of this nature requires evaluation of model realism with respect to relevant regional climate dynamics, as well as consideration of model independence within the ensemble, such that the final reduced ensemble is subject to less influence from unrealistic models whilst ensuring model-based uncertainties are sufficiently sampled.

Models are assessed against synoptic scale circulation features and associated statistics, rather than simulated rainfall in the region, in order to assess the physical realism of the model rather than the realism of regional rainfall which will be subject to model parameterization. To navigate the subjectivity in selecting metrics to assess models against, the 'Day Zero' drought has been used as an episodic reference where relevant metrics should be able to capture the anomalous conditions during this event, consistent with the amplification of this event as a result of climate change. The extensive literature produced subsequent to the 'Day Zero' drought has been leveraged to identify metrics which capture relevant regional climate features.

Three regional climate features were selected: the South Atlantic Jet Stream, the South Atlantic Subtropical High and South Atlantic Cold Fronts. These are the primary dynamics behind moisture supply to the WRZ region, and anomalies therein are crucial to the 'Day Zero' drought. Various methods of quantifying these features have been developed and the ability of each method to capture the anomalous conditions during the ‘Day Zero’ period were compared and contrasted before selecting a final method for scoring of CMIP5 models against NOAA20CR and ERA5 reanalysis. Multiple methods were tested in order to ensure that the metrics are robust and not overly sensitive to the method formulation. Importantly, in most cases, the different methods used to evaluate each metric produced very similar results. This means that, while the optimal method was chosen in each case, the metrics are not very sensitive to particular details of each method and thus are likely to be robust.

Following the method described by McSweeney et al., (2015) each model was assigned a score of 'realistic', 'biased', 'significant-biased' or 'unrealistic' as a function of the performance across the three metrics. Unrealistic models were subsequently removed from the ensemble while significantly biased models were also excluded as their absence did not significantly reduce the range of future projections. These same metrics were then used to create a genealogy of models, demonstrating that even utilising only three simple metrics it is possible to identify models developed by the same institutions, here attaining the same grouping as in Knutti, Masson \& Gettelman (2013). This process highlights the importance of model selection in the first case, whereby considering each model within the ensemble equally is an unconscious weighting towards institutions that have developed more models (Sanderson, Knutti \& Caldwell, 2015a). The model groupings identified were then utilised to select only the best performing model from each grouping, further reducing ensemble size whilst increasing independence within the ensemble. An important result is that performance with respect to the different metrics was consistent across models, with models designated as unrealistic or significantly biased performing poorly across all metrics, while the best performing models are consistent across all metrics. This suggests that the evaluation metrics are not spurious and are indeed evaluating the fundamental realism of the models effectively and can therefore be considered robust.

Thus, after considering 16 CMIP5 models across all metrics, a set of 6 CMIP5 models, namely: 'MIROC-ESM-CHEM', 'BNU-ESM', 'CNRM-CM5', 'ACCESS1-0', 'GFDL-CM3' and 'bcc-csm1-1-m' are selected. These models are shown to all perform suitably well in capturing the dynamics that may result in prolonged drought in the WRZ, while all performing better than their sibling models, therefore presenting an ensemble of more independent and historically more realistic models than that of the full ensemble. Further, this final ensemble has been selected to ensure the range of temperature and precipitation projections is similar to that of a larger ensemble of suitably well performing models. Despite the emphasis on ensuring a wide range of future climate scenarios are preserved, eliminating poorly performing models has significantly reduced the range of projected future climate outcomes. The most extreme temperature projection within the final ensemble projects temperature anomalies, under the RCP8.5 scenario, of a ±3.5 °C increase by 2080 from a 1980-2005 baseline compared to ±4.5 °C from the full ensemble.

A key challenge in implementing the South Atlantic cold front method was the availability of high temporal resolution fields. Not all models in the ensemble have archived data for all fields and this resulted in an artificial subsampling of the ensemble to models with sub-daily temporal resolution wind fields available. As a result, projections of absolute precipitation are somewhat reduced in range with the worst-case anomaly half that of the full ensemble – however data availability has contributed significantly to this reduction. Conversely, probability of a 2-year drought is seen to be increased in the final ensemble compared to that of the full ensemble. While these constrained future projections may be a welcome consequence of eliminating unrealistic models, the primary utility of the final ensemble is instead in the reduced size of the ensemble, where with only 6 models presented, a future researcher may consider each model’s projected future climate pathway individually before selecting a model, or models, which best informs their use case, whilst being assured that this model performs suitably well in the region. This is of particular value for impacts modellers looking to consider multiple models who can issue a subset of future climate scenarios that are sufficiently independent and yet still represents model uncertainty, while strong similarity between two or more models within the ensemble will not be unduly biasing results.

This study has thus demonstrated the use of relevant, robust, regional model realism metrics in successfully sub-sampling the CMIP5 ensemble in such a way that unrealistic models can be removed while model independence can still be maximised. The resulting sub-sampling has particular value for further downstream analysis, while to some extent reducing uncertainties of future projections. The use of model realism metrics to sub-sample multi-model ensemble will always have some degree of subjectivity. Here the selection of realism metrics, though still subjective, is strongly guided through literature. Thresholds for discriminating between model scoring categories are also somewhat arbitrary and subjective and could be chosen differently which would ultimately affect the final ensemble sub-selection. However, if these subjective choices are made transparently and multiple measures and metrics are used and compared, the results may be openly evaluated by others.

Further work could consider different regional circulation features, or different approaches to evaluate realism; perhaps considering historical trend or the prevalence of extreme events. For example, considering the ability of models to reproduce historical trends in jet stream or SASH statistics could add further rigour to the assessment. Assessing to what extent different assessment metrics would impact the final sub-selection would be of interest, assessing whether a similar ensemble and range of projections is preserved or not. stream or SASH statistics, could add further rigour to the assessment. Considering to what extent different assessment metrics would impact the final sub-selection would be of interest, assessing whether a similar ensemble and range of projections is preserved or not.