Methods of Estimating Uncertainty of Climate Prediction and Climate Change Projection

A critical issue in climate prediction and climate change projection is to estimate their uncertainty. The estimation of uncertainty has been an intensive research field in recent years, which is also called the potential predictability study. The terminology of the uncertainty of prediction and the potential predictability are often alternatively used in literature due to their inherent linkage, although they have some difference in a rigorous framework of predictability theory. For example, when a system has a high potential predictability, we may think the uncertainty of its predictions to be small, and vice versa. In this chapter, unless otherwise indicated, the uncertainty of prediction and potential predictability have the similar meaning in describing and measuring the prediction utility, and are thus used alternatively. For simplicity, we also often use the term of predictability to denote the potential predictability.


Introduction
A critical issue in climate prediction and climate change projection is to estimate their uncertainty. The estimation of uncertainty has been an intensive research field in recent years, which is also called the potential predictability study. The terminology of the uncertainty of prediction and the potential predictability are often alternatively used in literature due to their inherent linkage, although they have some difference in a rigorous framework of predictability theory. For example, when a system has a high potential predictability, we may think the uncertainty of its predictions to be small, and vice versa. In this chapter, unless otherwise indicated, the uncertainty of prediction and potential predictability have the similar meaning in describing and measuring the prediction utility, and are thus used alternatively. For simplicity, we also often use the term of predictability to denote the potential predictability.
The uncertainty of prediction or predictability study is usually conducted using the strategy of ensemble prediction, from which there are a couple of metrics to quantify the potential predictability. Among them are variance-based measure and information-based measure, both quantifying the predictability or prediction uncertainty from different perspectives. In this chapter, we will introduce the two kinds of metrics. Emphasis will be placed on the similarity and disparity of these measures, and the realistic applications of the measures in studying the uncertainty of climate prediction and climate change projection. It should be noted that these potential predictability metrics do not make use of observation, which is essentially different from the actual prediction skills measured against observations like correlation skill or root mean square of errors (RMSE).

Signal-to-Noise Ratio (SNR) and potential predictability
The SNR has been a widely used measure of potential predictability [1,2]. At seasonal time scale, the signal is usually regarded as the atmospheric responses to the slowly varying external forcing such as sea surface temperature (SST), sea ice, snow cover, etc., whereas the noise is induced by the relatively high frequency atmospheric variability such as weather processes. In an ensemble seasonal climate prediction, the amplitude of signal and noise can be approximately quantified by the variance of ensemble mean and the averaged ensemble spread over all initial conditions [3][4][5], namely, Two common measures of potentially predictability are the signal-to-noise ratio (SNR) and the signal-to-total variance ratio (STR), i.e., It can be derived that the square root of STR is equivalent to the correlation of the signal component (S) to the prediction target itself. Thus, the STR is often defined as potential correlation (PCORR).
It is easy to derive that the STR is actually a perfect correlation skill, which assumes that the observation is an arbitrary ensemble member.  [ 8]. In Bayesian terminology, the climatological distribution is a prior distribution which can be usually derived from the long term historical observations. An ensemble prediction augments this prior information, and the additional information measured by RE is a natural measure of the utility or usefulness of this prediction and thus implies the potential predictability. In practice, ( | ) p i  and ( ) p  can be estimated directly from samples or approximated alternatively using kernel density estimation.
Another natural measure of predictability is the predictive information (PI), defined as the difference between the entropy of the climatological and forecast distributions: Considering (7), then The first term on the right hand side of Eq. (8) denotes the entropy of the prior distribution p(v) (climatological distribution), measuring the uncertainty of a prior time when no extra information is provided from the observed initial condition and forecast model; whereas the second term represents the entropy of the posterior distribution ( | ) p v i (forecast distribution), measuring the uncertainty after the observed initial condition and subsequent prediction becomes available (An elaborated illustration can be found in [9]). Thus a large PI indicates that the posterior uncertainty will decrease because of useful information being provided by a prediction (e.g., the larger ( | ) p v i the smaller uncertainty) that is, the prediction is to be more reliable in a "perfect model" context.
The predictive power (PP) was defined by [10] 1 exp( In the case where the PDFs are Gaussian distributions, which is a good approximation in many practical cases (including ENSO prediction, e.g., [11]). The predictive and climatological variances, and the difference between their means. The resulting analytical expression for the relative entropy RE, PI and PI are given as follows [1]: where, q and P are the climatological and predictive covariance matrices respectively;det is the determinant operator and tr is the trace operator; q  and p  are the climatological and predictive mean state vectors of the system, and n is the number of degree of freedom;. RE is composed of two components: (i) a reduction in climatological uncertainty by the prediction [the first two terms plus the last term on the right-hand side of (10)] and (ii) a difference in the predictive and climatological means [the third term on the rhs of (10)]. These components can be interpreted respectively as the dispersion and signal components of the utility of a prediction [12]. A large value of RE indicates that more information that is different from the climatological distribution is being supplied by the prediction, which could be interpreted as making it more reliable [1]. A key difference between relative entropy (RE) and predictive information (PI) is that RE vanishes if and only if the forecast and climatological distributions are identical (i.e., same mean and spread), while PI is zero as long as the two distributions have the same spread [9]. Remarkably, predictive information and relative entropy are invariant with respect to linear invertible transformations of the state [9][10].
For a scalar variable (e.g., an index), RE, PI, and PP can be simplified as 1/ 2 2

Mutual information
RE or PI is a predictability measure for individual predictions. The average of REs or PIs over all initial conditions reflects the average predictability and was proved to be equal to mutual information (MI), another quantity from information theory [9]. In the context of predictability, MI is defined as [9] ( , ) (16) where ( , ) p i  is the joint probability distribution between  and i . MI measures the statistical dependence between  and i , and vanishes when  and i are independent ( ).The equality of MI and average RE indicates that predictability can be measured in two equivalent ways: by the difference between forecast and climatological distributions or by the degree of statistical dependence between the initial condition i and the future state [ 13]. If the future state is on average unpredictable, individual forecasts should have the probability distribution identical to the climatological distribution, i.e., ( | ) ( ) p i p    and RE =0 for all predictions. This is equivalent to independence between i and  . Therefore, independence indicates unpredictability and dependence implies predictability.
MI is invariant with respect to nonlinear, invertible (nonsingular) transformations of state [9].
Thus, the MI between  and i equals to the MI between  and ensemble mean |i When forecast and climatological distributions are Gaussian, MI can be expressed, using (13), by [13]   Eq. (17) is the formula often used to calculate MI. Joe [14] and DelSole [15] showed that the transformations 1 exp( 2 ) MI   and 1 exp( 2 ) MI   produce "potential" skill scores which exhibit proper limiting behavior: they have values between 0 and 1, and the minimum (maximum) value 0 (1) occurs when MI vanishes (approaches infinite). Here "potential" indicates that they are perfect model measures. In this study, we will use the two "potential" skills to represent MI. Furthermore, if the forecast and climatological distributions are Gaussian, and forecast variance is constant, the above two "potential" skills respectively reduce to another two conventional "potential" skills: "potential" anomaly correlation ( p AC ) and "potential" mean square skill score ( p MSSS ) [9,13].

Relationship between SNR-based metrics and MI-based metrics
The averaged RE and PI ( RE and PI ) over all predictions (initial conditions) are identical to MI, as mentioned before. For seasonal climate prediction, the total variance (i.e., climate variance) can be decomposed into signal (S) variance and noise (N) variance, if the signal and noise are assumed to be independent of each other ( [16][17]), namely, X is the j-th member of the ensemble prediction starting from the i-th initial condition. The K is the ensemble size and M is the total number of initial conditions (predictions); and Without the loss of generality, the climatological mean is assumed to be zero, thus (20) can be expressed by where the overbar denotes the expectation over all predictions (initial conditions).Eq (14) and Eq. (21) can easily verify the property of MI, for example, Using (21), the information-based potential predictability measures MI , ( RE or PI ) and PP can be rewritten as the function of the mean signal and noise, or their ratio SNR. The q  and (17), thus we have [18]     The inequality in (23) is due to the fact that arithmetic mean is larger than or equal to geometric mean, or more strictly is a result of Jensen's inequality from information theory. Therefore, we have 1 exp ( 2 ), The equalities in (24), (25) and (23) hold if and only if 2 |i   is constant, as addressed in (18) and (19). The conditions that the forecast and climatological distribution are both Gaussian and the forecast variance 2 |i   is constant are equivalent to the condition that i and  are joint normally distributed [13,18]. If i and  are joint normally distributed, the probability  are all Gaussian distributions and there are [13,[18][19] where (26) is obtained using a linear regression with 0  being the linear correlation between the initial state i and the future state . As mentioned earlier, MI measures the statistical dependence between i and  . As can be seen from (26) One interesting question arises here, namely that, how we understand the MI-SNR discrepancy when there is significant variability of prediction variance, as expressed in (24) and (25) It should be noted that the above conclusion should not be challenged by a possible fact that SNR-based skill might have a better relationship to actual skill than MI-based skill, simply because the actual skill is often measured by the linear correlation (or related quantity), which is inherent to the SNR-based skill. Thus, a more challenging issue is how to design new metrics to measure actual forecast skill which could appreciate the MI-based extra predictive information beyond SNR. In principle, the MI between ensemble mean prediction |i   and actual observation O could have the potential capability to quantify the MI-based potential predictability [15]. However, how to effectively estimate MI in this context is not an easy issue.
In summary, there are connections between information-based potential predictability and SNR-based potential predictability, as built by the above equations. In other words, all the averaged information-based potential predictability measures are better than SNR-based predictability in characterizing 'true' potential predictability. When the climatology and prediction distribution are both Gaussian and the prediction variances are constant, the information-based measure is equivalent to the SNR-based potential measure.

Maximum SNR and PrCA
The signal and noise are theoretically statistically irrelevant when the ensemble size is infinite. However, the ensemble size is always finite in reality, thus the estimation of the signal is often contaminated by the noise. An optimal estimate for the largest potential predictability should be to maximize the SNR, from which the resultant signal component is the most predictable.
We denote by S and N signal and noise of variable X, where S and N are matrixes of a twodimension field describing temporal and spatial variation of the signal and noise of one variable of interest, namely, this section is at the framework of the multivariate statistics. where, i j X is the j-th member of the ensemble prediction starting from the i-th initial condition. The K is the ensemble size and M is the total number of initial conditions (predictions); and , 1 1 Our goal is to look for a vector q, which can maximize the ratio of the variance of signal and noise that are projected onto the vector, namely, Where, S  is the covariance matrix of signal, N  is the covariance matrix of noise. The solution of (30) can be obtained by solving the below eigenvalue equation Thus, the analysis of the largest potential predictability is also called the maximum signalto-noise EOF (MSN EOF) analysis, first introduced by Allen and Smith [21] to estimate the signal optimally by suppressing the influences of noise, and widely used already in climate predictability study [20,[22][23].
Practically, the number of grid points is always much larger than the number of total samples in climate studies, thus usually N  doesn't have full-rank, leading to a solution of ill-conditioned inversions. There are two common methods to solve this issue, as introduced below. T T T T     , respectively. If k truncated modes remain, where k is usually much smaller than the number of spatial grids, the signal and noise covariance matrix is a full-rank of k matrix. Thus, eq. (31) can be easily solved and the vector q (denoted as qeof ) is called filter pattern, which is a kelement vector, the filter pattern on the truncated EOF space. The leading predictable component is 1 i e is a matrix of m*k, where m is the number of spatial grids and k is the number of the truncated modes. EOF could be employed using signal, or noise matrix or corresponding data matrix. The q is the filter pattern, rather than the most predictable pattern. The most predictable pattern v can be obtained using the regression method, i.e.,

Solving eq (29) using whitening approach
The approach is to whiten the noise variance (i.e., the denominator), making N  an identity matrix and whitening the covariance matrix of signal( S  ) simultaneously. Thus, eq. (29) Based on the matrix theory, the SNR in eq. (37) reaches maximum when q' is the eigenvector of WS  , the whitened signal covariance associated with the whitening noise N  . The q' is a modified q by a whitening factor. The algorithm is briefly summarized as follows: i. Make the covariance matrix of noise ( N  ) identity, namely, ii. Whiten the signal covariance matrix by the transformation matrix 1/ 2 ED  , using the k leading modes iii. The SNR of (37) reaches maximum when q' is the leading eigenvector of the whitened signal covariance matrix WS  (in descent order). It is easy to see the relationship between q and q' Thus, 1/ 2 ' q ED q   iv. After the filter pattern q is known, the most predictable component is easy to derive as shown in method 1, namely projecting the signal (ensemble mean) on the filter patterns The most predictable component is the one corresponding to the largest signal-to-noise ratio. All PrCs are temporally orthogonal (uncorrelated) with each other. It is noted that (41) is a little different from (33) or (34) where the PCs of truncated EOF spaces are used. It is due to a different truncation procedure in the two methods. In this first method, the truncation is applied before optimization whereas in the second method, the truncation is implicitly integrated into the whitening process. However both should be equivalent, which can be seen by another expression of (41) Where the  is the estimated variance using PrCA modes. Thus, the variance explained by a specific mode measured in the original space, and the truncated space, is respectively as below:

Maximizing PI and PrCA
Another interpretation to MSN EOF is its connection with information-based measure PI or PP defined in (11) - (15). For example, as argued in [10], the predictive power PP is a positively orientated predictive index, defined by the difference between posterior (prediction) entropy and prior (climatology) entropy, thus measuring the decrease of uncertainties due to prediction. which is equivalent to the maximization of SNR, i.e., MSN EOF. In some literature, the term of MSN EOF and PrCA are alternatively used due to their complete equivalence. Actually, both the MSN EOF and PrCA methods belong to the discriminant analyses because the two methods, though from different perspectives, can be understood to seek a best linear combination of variables that separates the signal and the noise as much as possible [13]. The both methods identify the "filter pattern", or weight matrix, providing an optimized filter to discriminate the signal and noise, where the time series reflects the temporal evolution of the dominant mode of the signal, and the spatial pattern characterizes the spatial distribution of the dominant mode of signal, which are respectively referred to as spatial pattern, or the most predictable pattern.
It should be noted that the equivalence of SNR-based and information-based PrCA approach is based on the condition that the climatology and forecast distribution are both Gaussian.It is apparent since the PI and PP cannot be only expressed by the form of prediction and climatology variance as (11) -(15) under non-Gaussian assumption. It is difficult to derive the optimization solution for PI or PP from their general definitions of (8) and (9).
A remark to the algorithm of Schneider and Griffies [10] is a technical issue. In Schneider and Griffies [10], the PrCA is proposed to derive by minimizing PP, i.e., minimizing 2 leading to the below eigenvalue equation: where T  is the total variance. The optimal filter resulting in the most predictable is the eigenvector f with the smallest eigenvalue  . Comparing (44) with (31) reveals that eigenvalue  and q are reciprocal, indicating the equivalence of PrCA using maximization of (31) and minimization of (44). Usually, the eigenvector with the smallest eigenvalue often lacks of a stable, large scale-like pattern, making the approach of (44) impractical in real application. The truncated EOF space, which is used in solving (31) and (44), can greatly reduce this concern but still the most predictable pattern contains some noise. Thus, the MSN EOF approach, introduced above, is a better option.

A practical application -Potential predictability of climate change projection in AR5
In this section, we will explore the uncertainty of climate change projection using the above theoretical framework. The estimation of uncertainty is based on the Coupled Model Intercomparison Project Phase 5 (CMIP5), a new set of climate model experiments involved in the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5). The CMIP5 is promoted to address some crucial issues on climate modeling and future climate state. More than 20 climate models were employed in this project with main focus on: 1) evaluate model predictability of future climate on different time scales (near term (out to about 2035) and long term (out to 2100 and beyond)), 2) understanding key mechanisms responsible for differences in model projections, and 3) quantify some important feedbacks of climate system like clouds and carbon cycle.
One of experiments used in CMIP5 is the Representative Concentration Pathways (RCPs) scenario. All model experiments involved in this scenario are forced by four kinds of mixing greenhouse gases (GHGs) boundary conditions which will finally lead increasing of radiation by 2.6, 4.5, 6.0 and 8.0 watt per square at the end of 21 century.
In this chapter, we will use the sea surface temperature (SST) projection of scenario R60 (the increasing of 6.0 watt per square experiment) to evaluate the potential predictability of climate projection of the scenario R60. At present, only nine models collected in R60 are available to download (From ESG-PCMDI Gataway), as summarized in table 1.  The SST outputs from these models are all monthly averaged data. For the purpose of the study of the climate change, we use annual mean in the following discussions. Because the lack of uniformity of ensemble member, only one member is used for each model here. In this study, we confine the domain to the Pacific across 60S to 60N. Fig. 1 and Fig 2a are the spatial pattern and time series of the first EOF (Empirical Orthogonal Function) for the Pacific Ocean from 2051-2100. As can be seen in Fig.2a, the Pacific SST has a striking increase, with the strongest response to the forcing of GHGs in the tropical Pacific along the equator as shown in Fig.1. In the extra-tropical beyond the 30S and 30N, the increase in SST is relatively weaker. On average, the mean temperature of the Pacific ocean of 60S to 60N increases around 0.5 to 1C from 2051-2010 in these models, as shown in Fig. 2b, the evolution of the mean temperature over the Pacific ocean. The mean of multiple models has the increase rate of around 0.75C as shown by the red line in Fig. 2b.  It is of great interest to explore the uncertainty of the above projections. As introduced aforementioned, one can use the above information-based framework to measure the uncertainty of climate prediction, given the multiple ensembles available. Apparently, there are several challenges here: 1) there is only one-member projection for each model, lacking sufficient ensembles; 2) the projection is not dependent on initial condition, thus any measures based on multiple initial conditions are invalid here; 3) the climatological distribution used in estimating the uncertainty may be uncertain under the background of global warming. For the first issue, we propose to solve it using multiple model strategy, i.e. pool all model projections to construct a 9-member ensemble. Under the framework of potential predictability, the model is assumed to be perfect. Thus the disparities among these model projections can be viewed as the ensembles of a perfect model, perturbed by initial conditions or other parameters. For the second issue, we assume that the projection is a long-term prediction at a given initial condition. The distribution for the average of multiple model projections is used as climatological distribution here. Fig. 3 are the variations of projection utility RE during the projection time from 2051 to 2010.The climatological mean and variance are estimated from all 'ensemble' members and years (sample size is 50*9) as in [25]. The projection mean and variance are estimated each year from the 9-member ensemble. As can be seen, it is apparent that the utility R continues to decrease until around 2070 and then bounce after 2080. For the projection during 2070 -2080, RE is small. When projection (prediction) and climatology distributions are identical, the relative entropy R is zero from (14). In theory, a nonzero value of R indicates predictability. However, in practice, a finite sample size introduces sampling errors that lead to a nonzero R even though there is no extra information supplied by the prediction. Therefore the statistical significance level should exceed the extent of uncertainty due to the finite sample size. We quantify the extent of uncertainty using a Monte Carlo method as in [26]. A sample with 9 members is randomly drawn from the climatology distribution and its relative entropy R is computed with respect to the climatology distribution. This process is repeated 10 000 times, and the value above 95% of 10 000 RE is considered to be the significant level as shown in Fig. 3 (dashed line). As can be seen, the projections between around 2070 and 2080 have statistically 'zero' relative entropy, and the other projections beyond this period have significant relative entropy.

Displayed in
A striking feature of RE in Fig. 3 is its U-shape variation with the projection time (i.e., the time step of integration), which is quite different from actual ensemble forecast at time scales from days to seasons. Typically, the RE monotonously decreases with the lead time of prediction at the time scales from days to seasons (e.g., [1][2]11]), i.e., the predictability decreases with lead time. The monotonous variation of RE with lead time of predictions well characterizes the nature and attributes of realistic atmospheric and oceanic system, which is chaotic and stochastic, leading to the information at initial conditions gradually dissipated with lead time. Apparently it is not this case here, since the projection is not an initial value problem, and mainly is a response to external forcing (e.g., CO2).
One possible explanation for this U-shape is related to the climatological distribution used here. We used the average of multiple model projections that have an apparent trend as the climatology distribution. If the RE is dominated by the ensemble mean (ensemble mean square) and the contribution of ensemble spread is relatively much smaller, the RE can show such a U-shape structure. Another plausible explanation is based on a hypothesis, namely, the climatology from multiple models is close to the true value. Under this assumption, the projection with small RE in figure 3 has high fidelity and vice versa. Here, we use the RE to measure the difference between the distribution of projection and the designed distribution, which has been also used in previous studies [17]. However such a hypothesis may cause concerns. One may argue to use present climatological distribution as a reference distribution in the above discussions. However, it can be expected that the climatology of the scenario of R60 should be quite different from the present one. Thus, a further study on the reference distribution is highly demanded in estimating uncertainty of climate change projection.

Conclusion
In this chapter, the SNR-based and information-based measures of potential predictability were introduced. They include the signal to noise ratio (SNR) and two measures of information-based predictability. One is relative entropy (RE) that measures individual potential predictability whereas the other is mutual information (MI), the average of RE over all initial conditions, which measures the average potential predictability. From statistical derivation and theoretical analysis, we have below conclusions: i. The SNR usually measures the average predictability with the assumption that signal inherent to slowly varying external forcing is predictable and the noise is unpredictable; ii. A new measure of prediction utility that is derived from information theory is introduced. It measures the additional information provided by a prediction (p) over that already available from the climatological or reference distribution (q). One natural measure is their relative entropy RE defined as the relative difference of entropy between p and q. For the case of Gaussian distributed p and q, the RE can be expressed in terms of the prediction and reference means and covariance. iii. Averaged RE over all initial conditions, called the mutual information (MI), a measure of the statistical dependence of the forecast state and the initial (boundary) conditions, measure the averaged predictability. The MI-based metrics can measure more potential prediction utility than the SNR-based counterpart. The MI-based predictability measures the statistical dependence, linear or nonlinear, between ensemble mean (prediction) and an ensemble member (hypothetical observation), whereas the SNRbased predictability only measures a linear relationship between prediction and hypothetical observation. iv. When the prediction and climatological distribution are Gaussian and the ensemble spread is constant with predictions, both measures are identical to each other. When the ensemble spread is not constant, the SNR-based predictability often underestimates the potential predictability. v. The predictable component analysis (PrCA), a method that assesses the most predictable patterns, is introduced. The PrCA decomposes the predictability into patterns accounting for different fractions of the total predictability. Distinguishing spatial structures that are unpredictable from those that are predictable is important for practical prediction problems, particularly when the predictable patterns are few in number.
As an example, the uncertainty of the climate change projection from scenario R60 of AR5 was evaluated, with the Pacific SST as the target. Nine models from different countries were participated in this evaluation. It was found that the most striking warming occurs at the tropical Pacific along the equator. In the extra-tropics beyond 30S to 30N, the increase in SST is relatively weaker. On average, the mean temperature of the Pacific ocean of 60S to 60N increases around 0.5 to 1C from 2051-2010 in these models. The relative entropy RE, measuring the utility of climate projection, continues to decrease until around 2070 and then bounce after 2080. For the projection during 2070 -2080, RE is small. Under the assumption that the climatology from multiple models is close to the true value, the projection during the period with small RE suggests high fidelity and vice versa.

Author details
Youmin Tang