Complexity Concepts and Non-Integer Dimensions in Climate and Paleoclimate Research

The ongoing global climate change has severe effects on the entire biosphere of the Earth. According to the most recent IPCC report [1], it is very likely that anthropogenic influences (like the increased discharge of greenhouse gases and a gradually intensifying land-use) are important driving factors of the observed changes in both the mean state and variability of the climate system. However, anthropogenic climate change competes with the natural variability on very different time-scales, ranging from decades up to millions of years, which is known from paleoclimate reconstructions. Consequently, in order to understand the crucial role of man-made influences on the climate system, an overall understanding of the recent system-internal variations is necessary.


Introduction
The ongoing global climate change has severe effects on the entire biosphere of the Earth. According to the most recent IPCC report [1], it is very likely that anthropogenic influences (like the increased discharge of greenhouse gases and a gradually intensifying land-use) are important driving factors of the observed changes in both the mean state and variability of the climate system. However, anthropogenic climate change competes with the natural variability on very different time-scales, ranging from decades up to millions of years, which is known from paleoclimate reconstructions. Consequently, in order to understand the crucial role of man-made influences on the climate system, an overall understanding of the recent system-internal variations is necessary.
The climate during the Anthropocene, i.e. the most recent period of time in climate history that is characterized by industrialization and mechanization of the human society, is well recorded in direct instrumental measurements from numerous meteorological stations. In contrast to this, there is no such direct information available on the climate variability before this epoch. Besides enormous efforts regarding climate modeling, conclusions about climate dynamics during time intervals before the age of industrial revolution can only be derived from suitable secondary archives like tree rings, sedimentary sequences, or ice cores. The corresponding paleoclimate proxy data are given in terms of variations of physical, chemical, biological, or sedimentological observables that can be measured in these archives. While classical climate research mainly deals with understanding the functioning of the climate system based on statistical analyses of observational data and sophisticated climate models, paleoclimate studies aim to relate variations of such proxies to those of observables with a direct climatological meaning. Classical methods of time series analysis used for characterizing climate dynamics often neglect the associated multiplicity of processes and spatio-temporal scales, which result in a very high number of relevant, nonlinearly interacting variables that are necessary for fully describing the past, current, or future state of the climate system. As an alternative, during the last decades concepts for the analysis of complex data have been developed, which are mainly motivated by findings originated within the theory of nonlinear deterministic dynamical systems. Nowadays, a large variety of methods is available for the quantification of the nonlinear dynamics recorded in time series [2,3,4,5,6,7,8,9], including measures of predictability, dynamical complexity, or short-as well as long-term scaling properties, which characterize the dynamical properties of the underlying deterministic attractor. Among others, fractal dimensions and associated measures of structural as well as dynamical complexity are some of the most prominent nonlinear characteristics that have already found wide use for time series analysis in various fields of research.
This chapter reviews and discusses the potentials and problems of fractal dimensions and related concepts when applied to climate and paleoclimate data. Available approaches based on the general idea of characterizing the complexity of nonlinear dynamical systems in terms of dimensionality concepts can be classified according to various criteria. Firstly, one can distinguish between methods based on dynamical characteristics estimated directly from a given univariate record and those based on a (low-dimensional) multivariate projection of the system reconstructed from the univariate signal. Secondly, one can classify existing concepts related to non-integer or fractal dimensions into self-similarity approaches, complexity measures based on the auto-covariance structure of time series, and complex network approaches. Finally, an alternative classification takes into account whether or not the respective approach utilizes information on the temporal order of observations or just their mutual similarity or proximity. In the latter case, one can differentiate between correlative and geometric dimension or complexity measures [10]. Table 1 provides a tentative assignment of the specific approaches that will be further discussed in the following. It shall be noted that this chapter neither gives an exhaustive classification, nor provides a discussion of all existing or possible approaches. In turn, the development of new concepts for complexity and dimensionality analysis of observational data is still an active field of research.
In order to illustrate the specific properties of the different approaches discussed in this chapter, the behavior of surface air temperature data is studied. Specifically, the data utilized in the following are validated and homogenized daily mean temperatures for 2342 meteorological stations distributed over Germany ( Figure 1) and covering the time period from 1951 top 2006. The raw data have been originally obtained by the German Weather Service for a somewhat lower number of stations before being interpolated and postprocessed by the Potsdam Institute for Climate Impact Research for the purpose of validating regional climate simulations ("German baseline scenario"). Before any further analysis, the annual cycle has been removed by means of phase averaging (i.e. subtracting the long-term climatological mean for each calendar day of the year and dividing the residuals by the corresponding empirical standard deviation estimated from the same respective day of all years in the record). This pre-processing step is necessary since the annual cycle gives the main contribution to the intra-annual variability of surface air temperatures in the mid-latitudes and would thus lead to artificially strong correlations on short to intermediate time-scales (i.e. days to weeks) [11]. In addition, since some of the methods to be discussed can exhibit a considerable sensitivity to non-stationarity, linear trends for the residual mean temperatures are estimated by a classical ordinary least-squares approach and subtracted from the de-seasoned record. The remainder of this chapter will follow the path from established self-similarity concepts and fractal dimensions (Section 2) over complexity measures based on the auto-covariance structure of time series (Section 3) to modern complex network based approaches of time series analysis (Section 4). Mutual similarities and differences between the individual approaches are addressed. The performance of the different approaches is illustrated using the aforementioned surface air temperature records. Subsequently, the problem of adapting the considered methods to time series with non-uniform (and possibly unknown) sampling as common in paleoclimatology is briefly discussed (Section 5).

Self-similarity approach to fractal dimensions
The notion of fractal dimensions has originally emerged in connection with self-similar sets such as Cantor sets or self-similar curves or objects embedded in a metric space [9]. The most classical approach to quantifying the associated scaling properties is counting the number of boxes needed to cover the fractal object under study in dependence on the associated length scale, which behaves like a power-law for fractal systems. More formally, studying the asymptotic behavior of the double-logarithmic dependence between number and size of hypercubes necessary to cover a geometric object with ever decreasing box size defines the box-counting dimension (often also simply called "the" fractal dimension) Given a trajectory of a complex system in a d-dimensional space that is supposed to correspond to an attractive set, covering the volume captured by this trajectory by hypercubes in the way described above allows estimating the fractal dimension of the associated attractor. More general, considering the probability mass of the individual boxes, pi, one can easily generalize the concept of box-counting dimensions to so-called Renyi dimensions [12,13] 0 log lim , (1 )log which give different weights to parts of phase space with high and low density (in fact, the box coverage probabilities pi serve as naïve estimators of the coarse-grained invariant density p(x) of the dynamical system under study). The special cases q=0,1,2 are referred to as the box-counting (or capacity), information, and correlation dimension.
In typical situations, only a univariate time series is given, which can be understood as a low-dimensional projection of the dynamics in the true higher-dimensional phase space. In such cases, it is possible to reconstruct the unobserved components in a topologically equivalent way by means of so-called time-delay embedding [14], i.e. by considering vectors ( 1) ( , ,..., ), where the unknown parameters N and  (embedding dimension and delay, respectively) need to be appropriately determined. The basic idea is that the components of the thus reconstructed state vectors are considered to be independent of each other in some feasible sense, thus representing the dynamics of different observables of the studied system. There are some standard approaches for estimating proper values for the two embedding parameters. On the one hand, the delay can be inferred by considering the time after which the serial correlations have vanished (first root of the auto-correlation function) or become statistically insignificant (de-correlation time) -in these cases, the resulting components of the reconstructed state space are considered linearly independent. Alternatively, a measure for general statistical dependence such as mutual information can be considered to estimate the time after which all relevant statistical auto-dependences have vanished [15]. On the other hand, the embedding dimension is traditionally estimated by means of the false nearest-neighbor method, which considers the changes in neighborhood relationships among state vectors if the dimension of the reconstructed phase space is increased by one. Since such changes indicate the presence of projective effects occurring when considering a too low embedding dimension, looking for a value of N for which the neighborhood relationships between the sampled state vectors do not change anymore provides a feasible estimate of the embedding dimension [16]. An alternative approach is considering the socalled singular system analysis (SSA), which allows determining the number of statistically relevant eigenvalues of the correlation matrix of the high-dimensionally embedded original record as an estimate of the true topological dimension of the system under study [17,18].
Having reconstructed the attractor by finding a reasonable approximation of its original phase space as described above, one may proceed with estimating the fractal dimensions by means of box-counting. However, since this approach requires studying the limit of many data, is may become unfeasible for analyzing real-world observational time series of a given length. As alternatives, other approaches have been proposed for estimating some of the generalized fractal dimensions Dq, with the Grassberger-Procaccia algorithm for the correlation dimension [19,20] as the probably most remarkable example. Details on corresponding approaches can be found in any contemporary textbook on nonlinear time series analysis.
A noteworthy alternative to considering fractal dimension estimates based on phase space reconstruction has been introduced by Higuchi [21,22], who studied the behavior of the curve length associated with a univariate time series in dependence on the level of coarse-graining, (where [.] denotes the integer part), which scales with a characteristic exponent corresponding to the fractal dimension D0, Figure 2 shows the actual behavior of the thus computed curve length with varying coarsegraining level k (equivalent to the embedding delay in Equation (3)) for the daily mean temperature record from Potsdam. One can see that there are two distinct scaling regimes corresponding to time scales up to about one week and above about ten days. For the shorter time-scales, the slope of the linear fit in the double-logarithmic plot yields values between 1.6 and 1.7, which are of the order of magnitude that is to be expected for lowdimensional chaotic systems with two topological dimensions (note that the drawing of the curve underlying the definition of the curve length L(k) corresponds to a two-dimensional space). In turn, for larger time scales, the slope of the considered function takes values around 2, implying that the dynamics on these time-scales is less structured and resembles a random walk without the distinct presence of an attractive set in phase space with a lower (fractal) dimension. It should be emphasized that the shorter time-scale appears to be coincident with typical durations of large-scale weather regimes, whereas the second range of time-scales exceeds the predictability limit of atmospheric dynamics. The difference between both scaling regimes becomes even more remarkable when studying the corresponding spatial pattern displayed by all 2342 meteorological stations in Germany ( Figure 3). On the shorter time-scales, the fractal dimension is significantly enhanced in the easternmost part of the study area, whereas the same region shows the lowest values of D0 on the longer time-scales. The presence of two different ranges of time-scales with distinctively different spatial pattern is actually not unique to the fractal dimension, but can also be observed by other complexity measures (see Section 3.5 of this chapter). The probable reason for this finding is the presence of atmospheric processes (related to more marine and continental climates as well as low-and highlands) affecting the different parts of the study area in different ways on short and long time-scales. A more detailed climatological interpretation of this finding is beyond the scope of the present work.

Complexity measures based on serial correlations
As an alternative to concepts based on classical fractal theory, scaling properties based on the linear auto-covariance structure of time series data also contain valuable information.
Corresponding approaches utilizing basic methods from multivariate statistics have been referred to as multivariate dimension estimates [11,23,24,25] and provide meaningful characteristics that can be reliably estimated even from rather short time series, which still constitute a fundamental limit for classical fractal dimension analysis.
The original motivation for the introduction of multivariate dimension estimates to climate research has been that the ''complete'' information about the climate of the past requires considering a set of complementary variables, which form a multivariate time series. The fraction of dynamically relevant observables, which is interpreted as a measure for the average information content of a given variable, can vary itself with time due to the nonstationarity of the climate system. Temporal changes of this information content, i.e. of the effective ''dimension'' of the record, can therefore serve as an indicator for changes in environmental conditions and the corresponding response of the climate system. Moreover, widely applicable ideas from the theory of nonlinear deterministic processes can be used to adapt this approach to univariate time series. In the following, the mathematical background of the corresponding approach will be detailed.

Dimensionality reduction of multivariate time series
Quantifying the number of dynamically relevant components in multivariate data sets commonly requires an appropriate statistical decomposition of the data into univariate components with a well-defined variance. In the most common case, these components are required to be orthogonal in the vector space spanned by the original observables, i.e. linearly independent. A corresponding decomposition (typically with the scope of achieving a suitable dimensionality reduction of a given high-dimensional data set) is commonly realized by means of principal component analysis (PCA) [26,27], which is also referred to as empirical orthogonal function (EOF) analysis or Karhunen-Loève decomposition (KLD) depending on the particular scientific context and application. The basic idea beyond this technique is that a proper basis adjusted to the directions of strongest (co-)variation in a multivariate data set can be identified using a principal axis transform of the corresponding correlation matrix. In this case, the associated eigenvectors of the correlation matrix contain weights for linear superpositions of the original observables that result in the largest possible variance. The actual amplitude of this variance is characterized by the associated non-negative eigenvalues.
Technically, consider simultaneous records Xij of different observables X (j) at times ti combined in a TxN-dimensional data set X=(Xij) with column vectors representing T successive observations of the same quantity and row vectors containing the simultaneous measurements of N different observables. Here, the columns of X may represent different variables measured at the same location or object, or spatially distributed records of the same observable or different variables. The associated correlation matrix is given as the covariance (or scatter) matrix S=Y T Y where the matrix Y is derived from X by subtracting the column means from all columns of X and then dividing the residual column vectors by their standard deviations. It should be emphasized that column mean and standard deviation represent here estimates of the expectation value and expected standard deviation of the respective observable. The elements of S are the linear (Pearson) correlation coefficients between all pairs of variables, which provide reasonable insights into mutual linear interrelationships between the different variables if the observations are normally distributed or the sample size is sufficiently large to neglect the former requirement according to the central limit theorem. By definition, S is symmetric and positive semidefinite, i.e. has only non-negative eigenvalues i 2 . Without loss of generality, one may arrange these N eigenvalues in descending order and interpret them as the variances of the principal components of X given by the corresponding eigenvectors.
It shall be noted that there are various generalizations of linear PCA, involving decompositions of multivariate data sets into projections onto curved manifolds that take the place of the orthogonal eigenvectors describing the classical linear principal components. Due to the considerably higher computational efforts for identifying these objects in the underlying vector space and correctly attributing the associated component variances, corresponding methods like nonlinear PCA [28], isometric feature mapping (Isomap) [29], or independent component analysis (ICA) [30], to mention only a few examples, will not be further discussed here, but provide possibilities for generalizing the approach detailed in the following.

KLD dimension density
The idea of utilizing PCA for quantifying the number of dynamically relevant components, i.e. transferring this traditional multivariate statistical technique into a dynamical systems context, is not entirely new. In fact, it has been used as early as in the 1980s for identifying the proper embedding dimension for univariate records based on SSA (see Section 2 of this chapter). Ciliberti and Nicolaenko [31] used PCA for quantifying the number of degrees of freedom in spatially extended systems. Since these degrees of freedom can be directly associated with the fractal dimension or Lyapunov exponents of the underlying dynamical system [32,33,34], it is justified to interpret the number of dynamically relevant components in a multivariate record as a proxy for the effective dimensionality of the corresponding dynamical system.
More formally, Zoldi and Greenside [35,36,37,38] suggested using PCA for determining the number of degrees of freedom in spatially extended systems by considering the minimum number of principal components required to describe a fraction f (0<f<1) of a multivariate record. Let i 2 , i=1,…,N, again be the non-negative eigenvalues of the associated correlation matrix S given in descending order. The aforementioned number of degrees of freedom, which is referred to as the KLD dimension, can then be defined as follows [23]: For spatially extended chaotic systems, it has been shown that the KLD dimension increases linearly with the system size N, i.e. the number of simultaneously recorded variables [37]. This motivates the study of a normalized measure, the KLD dimension density / , instead of DKLD itself.

LVD dimension density
While the KLD dimension density can be widely applied for characterizing complex spatiotemporal dynamics based on large data sets (i.e. both N and T are typically large), it reaches its conceptual limits when being applied to multivariate data sets with a small number of simultaneously measured variables (small N), or used for studying non-stationary dynamics in a moving-window framework (small T). On the one hand, small N implies that KLD can only have very few distinct values (i.e. multiples of 1/N), so that small changes in the covariance structure of the considered data set may lead to considerably large changes of the value of this measure. On the other hand, short data sets (small T) imply problems associated with the statistical estimation of correlation coefficients between individual variables (particularly large standard errors and the questionable reliability of the Pearson correlation coefficient as a measure for linear interrelationships in the presence of non-Gaussian distributions). However, both cases can have a considerable relevance in the field of geoscientific data analysis.
As an alternative, Donner and Witt [11,23,24,25] suggested studying the characteristic functional behavior of KLD in dependence on the explained variance fraction f. Specifically, if the residual variances decayed exponentially, i.e.
the KLD dimension density would scale as in the limit of large N. The resulting coefficient (f) can be understood as characterizing the effective dimensionality of the system. The derived quantity (the dependence on f will be omitted for brevity from now on) has been termed the linear variance decay (LVD) dimension density of the underlying data set. Its estimation by means of linear regression according to Equation (9) has been discussed in detail elsewhere [11,25].
It should be mentioned that * does not yet give a properly normalized dimension density with values in the range between 0 and 1, which can already be observed for simple stochastic model systems [23,25]. However, using the limiting cases of identical (lowest possible value   min) and completely uncorrelated (highest possible value   max) component time series, one can derive analytical boundaries and properly renormalize the LVD dimension density to values within the desired range [39] as min max min It shall be noted that using the LVD dimension density instead of the KLD dimension density solves the problem of discrete values in the limit of small N, but still shares the conceptual limitations with respect to the limit of small T. As another positive feature, LVD has a continuous range and a much smaller variability with f than KLD. This variability is mainly originates from insufficiencies of the regression model (Equation (8)) and would vanish in case of large N and an exactly exponential decay of the residual variances, which is a situation that is, however, hardly ever met in practice [27].
Possible modifications of the LVD dimension density approach include the consideration of alternative measures of pair-wise statistical association, such as Spearman's rank-order correlation or phase synchronization indices [40,41], which may be of interest in specific applications. Although the formalism described above can applied in exactly the same way to such matrices of similarity measures, the statistical meaning of the corresponding decomposition is not necessarily clear.

Dimension densities from univariate time series
The previously discussed approach can be easily modified for applications to univariate time series [42]. For this purpose, the correlation matrix S of the multivariate record is replaced by the Toeplitz matrix of auto-correlations estimated from a univariate data set. In other words, the PCA commonly utilized for defining the KLD and LVD dimension densities is replaced by an SSA step (i.e. a "PCA for univariate data").
As a particular characteristic of the resulting "univariate dimension densities", it should be emphasized that the obtained results crucially depend on the particularly chosen "embedding" parameters, i.e. the "embedding dimension" N and time delay . In case of SSA-based methods, it is common to use an "over-embedding", i.e. a number of time-shifted replications of the original record that is much larger than the actual supposed dimensionality of the studied data. Since serial correlations usually decay with increasing time delay, increasing N beyond a certain value (i.e. adding more and more dimensions to the embedded time series) will not change the number of relevant components in the record anymore. As a consequence, LVD asymptotically takes stationary values. In turn, selecting the "embedding delay"  allows studying the dynamical complexity of time series on various time-scales (i.e. from the minimum temporal resolution of the record to larger scales limited only by the available amount of data). Consequently, LVD can change considerably as  is varied.

Application: Surface air temperatures
For the purpose of discussing measures of dimensionality based on the auto-covariance structure of an observational record, it is useful to first examine the auto-correlation function itself. As a first example, let us consider again the daily mean temperature record from Potsdam, Germany (Figure 4a). For this time series, the auto-correlations decay within only about 7-10 days to values below 0.2 ( Figure 4b). Consequently, using short time delays (below about one week) for embedding temperature records leads to components with considerable mutual correlations. In this case, one can expect a low LVD dimension density, since the information contained in one of the embedded components is already largely determined by the other components. In turn, for larger delays, the embedded components become approximately linearly independent of each other, implying that since correlations are generally weaker, more components need to be taken into account for explaining a given fraction of variance from the multivariate embedded record. Hence, the LVD dimension density should considerably increase with the delay. Indeed, this expectation is confirmed by Figure 4c, which displays a sharp increase of LVD with increasing embedding delay  especially at the scales below one week, whereas there is a saturation for larger delays at values rather close to one. Another interesting feature can be observed in the behavior of the LVD dimension density with increasing embedding dimension N (Figure 4c). For small delays (i.e. time scales with considerable serial correlations within the observational record), LVD increases with increasing N towards an asymptotic value that can be well approximated by estimating this measure for large, but fixed N. In contrast, for large delays, we find a decrease of the estimated LVD dimension density with increasing N without a marked saturation in the considered range of embedding dimensions. A probable reason for this is the insufficiency of the underlying exponential decay model. In fact, the exact functional form of the residual variances for random matrices clearly differs from an exponential behavior, but displays a much more complicated shape [27]. Furthermore, it should be noted that as both delay and embedding dimension increase, the number of available data decreases as Teff=T-(-1)N, which can contribute to stronger statistical fluctuations (however, the latter effect is most likely not relevant in the considered example). For intermediate delays, one can thus expect a certain crossover time scale between both types of behavior, which is related to the typical time scale of serial correlations. In order to further support these findings, Figure 5 shows the spatial pattern displayed by the LVD dimension density at all 2342 stations. For larger embedding delays (right panel), the components of the reconstructed multivariate record are in reasonable approximation linearly independent, resulting in high values of the LVD dimension density close to 1 (the limiting case for perfectly uncorrelated records). However, one can observe a marked West/East gradient with high values of LVD in the western and central part and much lower values in the eastern part of Germany. Referring to the interpretation of this measure, this finding could indicate that the temporal correlations decay slower in the eastern part that is subject to a more continental climate which typically varies on longer time scales than a marine climate present in the western part of the study area. It should be emphasized that the general spatial pattern closely resembles the behavior of the fractal dimension D0 (Figure 3b).
In turn, for low embedding delays (1 day), the observed spatial pattern is more complex with more fine-structure, yielding enhanced values (though still indicating considerable correlations) in the eastern and western parts of Germany and lower values in central Germany in a broad band from North to South, as well as in the southeastern part. The qualitative pattern again resembles that of the fractal dimension D0 (Figure 3a), with the exception that the enhanced values in the eastern part are less well-expressed, whereas the contrasts in the western part are considerably stronger.
In general, both characteristics display similar differences between the behavior on short and longer time scales, which are clearly related to the presence of auto-correlations with a spatially different decay behavior. Regarding the short-term dynamics, this statement is supported by the fact that a qualitatively similar spatial pattern as for the considered dimension estimates (but with opposite trend) can be obtained by coarsely approximating the temperature records by a first-order auto-regressive (AR [1]) process Xt=1Xt-1+t, where t is Gaussian white noise ( Figure 5, left panel). In fact, for an AR [1] process, the Toeplitz matrix of auto-correlations has a very simple analytical form, Sij=1 |i-j| . Even though there is no closed-form solution for its eigenvalues [43], one can easily show by means of numerical simulations that the resulting LVD dimension density for such processes depends hardly on N, but strongly on the value of the characteristic parameter 1. Since the latter is related to the time-scale of the associated exponential decay of auto-correlations as t*=-1/log 1, low values of 1 give rise to a fast decay and, hence, high values of the LVD dimension density, whereas the opposite is true for high values close to 1 (see Figure 6). This behavior is in excellent agreement with the theoretical considerations made above.

Complex network-based approaches
These days, the analysis of network structures is a common task in many fields of science such as telecommunication or sociology, where physical or social interactions (wires, friendships, etc.) can be mathematically described as a graph. When the corresponding connectivity pattern contains a certain number of interacting units (referred to as network vertices or nodes) and is neither completely random nor fully regular (e.g. a chain or lattice), but displays some less obvious type of structure, the resulting system is called a complex network. The structural features of such systems can be described using the rich toolbox of quantitative characteristics provided by the so-called complex network theory [44,45,46,47].
Besides the analysis of network structures based on a clearly "visible" substrate (such as infrastructures or communication systems), it has been demonstrated by various authors that complex network approaches can be useful for extracting and understanding the dynamical backbone of systems composed of a large number of dynamically interrelated units or variables, such as financial markets [48], the neuro-physiological activity of different regions of the brain [49], or the functioning of the climate system [50,51,52,53]. In the aforementioned cases, a network structure is identified using suitable measures of statistical association (e.g., linear Pearson correlation or nonlinear mutual information) between records of activity in different areas or of different variables or coupled units. Information on the underlying functional connectivity of the large-scale system is inferred by considering only sufficiently strong interrelationships and studying the set of such connections among the variety of subsystems.
In parallel to the development of complex network methods as a complementary tool for multivariate time series analysis, a variety of different approaches has been suggested for studying single univariate time series from a network perspective [54]. Existing approaches include methods based on transition probabilities after coarse-graining the time series' range or the associated reconstructed phase space [55], convexity relationships between different observations in a record [56], or certain notions of spatial proximity between different parts of a trajectory [57,58,59,60,61,62,63], to mention only the most prominent existing concepts in this evolving area of research (for a more detailed recent review, see [54]). For two of these approaches, the so-called visibility graphs and recurrence networks discussed below, it has been shown that some of the resulting network properties can be related to the concept of fractal dimensions or, more general, scaling analysis. In the following, the corresponding recent findings are summarized.

Visibility graph analysis
Visibility graphs have been originally introduced as a versatile tool for studying visibility relationships between objects in architecture or robot motion planning [64,65,66,67]. Lacasa and co-workers [55] suggested transferring this idea to the analysis of time series from complex systems, where local maxima and minima of the considered observable play the role of hills and valleys in a one-dimensional landscape. Specifically, in a visibility graph constructed from a univariate time series, the individual observations are taken as network vertices, and edges are established between pairs of vertices xi=x(ti) and xj=x(tj) that are "mutually visible" from each other, i.e. where for all xk=x(tk) with ti<tk<tj the following local convexity condition applies: When describing the connectivity of this network in the most common way in terms of the binary adjacency matrix Aij (here, Aij=1 implies that there exists an edge between vertices i and j), the latter can be consequently expressed as follows: where  denotes the Heaviside function defined in the usual way.
As a simplification of the standard visibility graph algorithm, it can be useful considering the so-called horizontal visibility graph, in which the connectivity is defined according to the horizontal visibility between individual vertices, i.e. there is an edge (i,j) between two observations xi and xj if for all k with ti<tk<tj, xk<min(xi,xj). Consequently, the associated adjacency matrix reads In other words, the horizontal visibility graph encodes the distribution of local maxima in a time series (i.e. short-term record-breaking events). Due to its simpler analytical form, it has the advantage that certain basic network properties can be more easily evaluated analytically than for the standard visibility graph.
As a particularly remarkable result, it has been demonstrated both analytically and numerically that for fractal as well as multifractal processes, the degree distributions p(k) of visibility graphs, i.e. the probabilities of finding vertices with a given number of connections (degree) , exhibit a power-law (commonly called "scale-free property" in complex network theory) with a characteristic scaling exponent that is directly related to the associated Hurst exponent H [68,69]. Moreover, it can be shown that for a wide class of such processes, the Hurst exponent is itself related with the fractal dimension D0 as D0=2-H, however, this relationship is not universal [70]. In this spirit, the scaling exponent obtained from visibility graphs can be considered as an alternative estimate of the fractal dimension. In turn, besides the validity of the aforementioned relationship between Hurst exponent and fractal dimension for the specific data set under study, the possible improvements with respect to computational efforts, required data volume and related issues still need to be systematically compared with those of existing estimators of the Hurst exponent.
In addition to the potentially ambiguous interdependence between Hurst exponent and fractal dimension, using visibility graph approaches for the purpose of estimating fractal dimensions from geoscientific time series may be affected by a further problem. Towards the ends of a time series, there is a systematic tendency to underestimate the actual degree of vertices just due to a lower number of possible neighbors [71]. While this feature will have negligible influence for long time series, it may considerably contribute to a bias in the degree distribution estimated from small data sets common to many geoscientific problems. In turn, a potential advantage of visibility graphs is that they do not require uniform sampling in time, which makes them applicable to typically problematic types of data such as paleoclimate records [71] or even marked point process data such as earthquake catalogues [72].

Recurrence network analysis
Recurrence networks are another well-studied approach for transforming time series into an associated complex network representation [60,61,62,63]. In contrast to visibility graphs, the basic idea is reconstructing the spatial structure of the attractor underlying the observed dynamics in the corresponding phase space. That is, given a univariate record the dynamically relevant variables need to be reconstructed by means of time-delay embedding first if necessary. Consequently, the first step of recurrence network analysis consists of identifying the appropriate embedding parameters by means of the corresponding standard techniques. Having determined these parameters, time-delay embedding is performed. For the resulting multivariate time series, the mutual distances between all resulting sampled state vectors (measured in terms of a suitable norm in phase space, such as Manhattan, Euclidean, or maximum norm) are compared with a predefined global threshold value .
Interpreting the state vectors as vertices of a recurrence network, only such pairs of vertices are connected that are mutually closer than this threshold, resulting in the following definition of the adjacency matrix: where ij denotes Kronecker's delta defined in the usual way. To put it differently, in a recurrence network only neighboring state vectors taken from the sampled trajectory of the system under study are connected. In this spirit, the recurrence network forms the structural backbone of the associated dynamical system. Moreover, since no information on temporal relationships enters the construction of the recurrence network, its study corresponds to a completely geometric analysis method.
The structural properties of recurrence networks have already been intensively studied.
Relating to the degree distributions, it has been demonstrated analytically as well as numerically that the presence of a power-law-shaped singularity of the invariant density p(x) of the studied dynamical system is a necessary condition for the emergence of scale-free degree distributions, the scaling exponent of which is, however, not necessarily associated with the system's fractal dimension, but with the characteristic behavior of the invariant density near its singularity [73]. More generally, recurrence networks are a special case of random geometric graphs aka spatial networks, where the network vertices have a distinct position in some metric space and the connectivity pattern is exclusively determined by the spatial density of vertices and their mutual distances [74]. The latter observation allows calculating expectation values of most relevant complex network characteristics given that the invariant density is exactly known or can at least be well approximated numerically [75]. Specifically, the transitivity properties of recurrence networks on both local and global scale can be computed analytically for some simple special cases [75]. A detailed inspection of these properties demonstrates that the global recurrence network transitivity , , , , can be considered as an alternative measure for the effective dimensionality of the system under study [76]. In contrast to established notions of fractal dimensions, the estimation of the associated transitivity dimension does not require considering any scaling properties of some statistical characteristics. The definition in Equation (18) is motivated by the fact that at least using the maximum norm, for random geometric graphs in integer dimensions d the expectation value of the network transitivity scales as (3/4) -d [74,76]. However, it should be noted that the proper evaluation of the transitivity dimension is challenged by the fact that it alternates between two asymptotic values, referred to as upper and lower transitivity dimension, as the recurrence threshold  is varied [76].
According to the aforementioned interpretation of the transitivity properties, it has been found that the associated local clustering coefficient providing a measure of transitivity on the level of an individual vertex is a sensitive tracer of dynamically invariant objects like supertrack functions or unstable periodic orbits [54,60,61,76]. In turn, global clustering coefficient (i.e. the arithmetic mean value of the local clustering coefficients of all vertices in the recurrence network) and network transitivity track changes in the dynamical complexity of a system under study that are related with bifurcations [10,61,77,78] or subtle changes in the dynamics not necessarily captured by traditional methods of time series analysis [10,79].
In a similar fashion, some other network measures based on the concept of shortest paths on the graph can be utilized for similar purposes. In summary, it has to be underlined that the recurrence network concept has already demonstrated its great potential for studying geoscientific time series, however, this potential has not yet been fully and systematically explored for different fields of geosciences.

Complexity and dimensionality analysis in paleoclimatology
Unlike for data obtained from meteorological observatories or climate models, the appropriate statistical analysis of paleoclimate proxy data is a challenging task. Particularly, a variety of technical problems arise due to the specific properties of this kind of data [24].
Firstly, paleoclimate data sets are usually very noisy due to significant measurement uncertainties, high-frequency variations, secondary (non-climatic) effects and the aggregation of the measurements over certain, not necessarily exactly known time intervals.
Secondly, in Earth history environmental conditions have changed both continuously and abruptly, on very long time-scales as well as on a set of different ''natural'' frequencies the influence of which has changed with time. Especially during the last million years, there has been a sequence of time intervals with cold (glacial) and moderate (interglacial) global climate conditions, which can be interpreted as disjoint states of the global climate system. Even more, these two types of states have alternated in a way that displays some complex regularity, i.e., the timing of the (rather abrupt) transitions between subsequent states (glacial terminations and inceptions) has been controlled by dominating frequencies of variations in the Earth's orbital parameters [80], which is commonly referred to as Milankovich variability. As a consequence of these multiple transitions, paleoclimate time series are intrinsically nonstationary with respect to variability on a variety of different time scales.
Finally, in the case of sedimentary and ice core sequences as the most common types of proxy records, the core depth has to be translated into an age value with usually rather coarse and uncertain age estimates [81,82]. Since the rate of material accumulation has typically varied with time as well, an equidistant sampling along the sequence does usually not imply a uniform spacing of observations along the time axis. Both unequal spacing of measurements and uncertainties in both timing and value pose additional challenges to any kind of time series analysis approach applied to paleoclimate data.

Analysis of time series with non-uniform sampling
As stated above, non-uniform sampling is an inherent feature of most paleoclimate records. Hence, the appropriate statistical analysis of such records requires a careful specific treatment, since standard estimators of even classical and conceptually simple linear characteristics are not directly applicable (or at least do not perform well) in case of unequally spaced time series data. Consequently, in the last decades there has been an increasing interest in developing alternative estimators that generalize the established ones in a sophisticated way.
Traditionally, many approaches for analyzing paleoclimate time series have implicitly assumed a linear-stochastic behavior of the underlying system, i.e. that the major features of the records can be described by ''classical'' statistical approaches like correlation or spectral analysis [83,84]. In particular, novel estimators for both time and frequency domain characteristics have been developed which do not require a uniform sampling [85,86,87,88,89]. In turn, many recent studies in the field of paleoclimatology, including such dealing with sophisticated statistical methods [84,90], have typically made use of interpolation to uniform spacing. It has to be underlined that this strategy, however, disregards important conceptual problems such as the appearance of spurious correlations in interpolated paleoclimate data [86] or the presence of time-scale uncertainty. At least the former problem can be solved by using improved more generally applicable estimators, whereas the impact of time-scale uncertainty can be estimated using resampled (Monte Carlo) age models and distributions of statistical properties estimated from ensembles of perturbed age models consistent with the original one [71].
Moreover, classical statistical methods such as correlation or spectral analysis are typically based on the assumption that the observed system is in an equilibrium state, which is reflected by the stationarity of the observed time series. However, this stationarity condition is usually violated in the case of paleoclimate data due to the variable external forcing (solar irradiation) and multiple feedback mechanisms in the climate system that drive the system towards the edge of instability. Hence, more sophisticated methods are required allowing to cope with non-stationary data as well [91]. One prominent example for such approaches is wavelet analysis [92,93,94], which allows a time-dependent characterization of the variability of a time series on different time scales. As for the classical methods of correlation and spectral analysis designed for stationary data, estimators of the wavelet spectrogram are meanwhile also available for unevenly sampled data, for example, in terms of the weighted wavelet Z transform [95,96,97,98,99,100], gapped wavelets [101,102], or a generalized multiresolution analysis [103,104]. Similar as for classical spectral analysis, such wavelet-based methods often exhibit scaling laws associated with fractal or multifractal properties of the system under study.

Fractal dimensions and complexity concepts in paleoclimate studies
The question of whether climate can be approximately described as a low-dimensional chaotic system has stimulated a considerable amount of research in the last three decades.
Notably, much of the corresponding work has been related with the study of paleoclimate records. As a prominent example, the seminal paper "Is there a climatic attractor?" by Nicolis and Nicolis [105] considered the estimation of the correlation dimension D2 of the oxygen isotope record from an equatorial Pacific deep-sea sediment core. A direct follow-up [106] presented a thorough re-analysis of the same record utilizing the information dimension D1. Both manuscripts started an intensive debate on the conceptual as well as analytical limits of fractal dimension estimates for paleoclimate time series. Grassberger [107] analyzed different data sets and could not find any clear indication for lowdimensional chaos. This absence of positive results has been at least partially triggered by the problematic properties of paleoclimate records, particularly the relatively small amount of data and their non-uniform sampling resulting in the need for interpolating the observational time series. Grassberger's results were confirmed by Maasch [108] who analyzed 14 late Pleistocene oxygen isotope records and concluded that "the dimension cannot be measured accurately enough to determine whether or not it is fractal". Fluegeman and Snow [109] used R/S analysis to estimate the fractal dimension D0 of a marine sediment record via the associated Hurst exponent H, whereas Schulz et al. [110] used the Higuchi estimator for a similar purpose. Mudelsee and Stattegger [111,112,113] estimated the correlation dimension of various oxygen isotope records using the classical Grassberger-Procaccia algorithm.
Due to the inherent properties of paleoclimate data, estimating fractal dimensions and related complexity measures is a challenging task. Instead of using the classical fractal dimension concepts, in the last years it has therefore been suggested to consider alternative methods that allow quantifying the dimensionality of such records. Donner and Witt [11,23,24] utilized the multivariate version of the LVD dimension density (see Section 3.3) for studying long-term dynamical changes in the Antarctic offshore sediment decomposition associated with the establishment of significant oceanic currents across the Drake passage at the Oligocene-Miocene boundary. In a similar way, Donges et al. used recurrence network analysis for sliding windows in time for identifying time intervals of subtle large-scale changes in the terrigenous dust flux dynamics off North Africa during the last 5 million years [10,79]. These few examples underline the potentials of the corresponding approaches for a nonlinear characterization of paleoclimate records.

Perspectives and challenges
Both classical as well as novel approaches to characterizing the dimensionality of paleoclimate records still face considerable methodological challenges. While established methods typically rely on the availability of long time series, this requirement can be relaxed when using correlation-or network-based approaches, which are in principle suited for studying nonlinear properties in a running windows framework and thus characterizing the time-varying complexity of environmental conditions encoded in the respective proxy variable under study. However, some methodological challenges persist, which have been widely neglected in the recent literature.
Most prominently, the exact timing of observations is of paramount importance for essentially all methods of time series analysis. In the presence of time-scale uncertainty inherent to most paleoclimate records, this information is missing and can only be incorporated into the statistical analysis by explicitly accounting for the multiplicity of agedepth relationships consistent with the set of available dating points. The latter can be achieved by performing the same analysis for a large set of perturbed age models generated by Monte Carlo-type algorithms, or by incorporating the associated time-scale uncertainty by means of Bayesian methods. However, an analytical theory based on the Bayesian framework can hardly be achieved for all possible methods of time series analysis, so that it is most likely that one has to rely on numerical approximations.
Even when neglecting time-scale uncertainty, the non-uniformity of sampled data points in time typically persists. Among all methods discussed in this chapter, only the visibility graph approach is able by construction to directly work with arbitrarily sampled data. However, this method is faced with the conceptual problem of how to treat values between two successive observations that have not been observed for whatever reason. Donner and Donges [71] argued that simply neglecting such "missing values" may account for a considerable amount of error in all relevant network measures, so that the meaningful interpretability of the obtained results could become questionable.
For the other mentioned approaches, time-delay embedding is a typical preparatory step for all analyses. Since interpolation can result in spurious correlations [86] or at least ambiguous results depending on the specific procedure, alternatives need to be considered for circumventing this problem. In the case of uni-and multivariate LVD dimension density, it is possible to directly utilize alternative estimators of the correlation function, e.g. based on suitable kernel estimates [86], for obtaining the correlation matrix of the record under study. For methods requiring attractor reconstruction (e.g. the Grassberger-Procaccia algorithm for the correlation dimension or recurrence network analysis), there are prospective approaches for alternative embedding techniques, e.g. based on Legendre coordinates [114], that shall be further investigated in future work.

Conclusions
Since the introduction of fractal theory to the study of nonlinear dynamical systems, this field has continuously increased its importance. Besides providing a unified view on scaling properties of various statistical characteristics in space or time that can be found in many complex systems, fractal dimensions have demonstrated their great potential to quantitatively distinguish between time series obtained under different conditions or at different locations, thus contributing to a classification of behaviors based on nonlinear dynamical properties. However, as it has been demonstrated both empirically and numerically, established concepts of fractal dimensions reach their fundamental limits when being applied to relatively short and noisy geoscientific time series, e.g. climate records. As potential alternatives providing measures with comparable meaning, but different conceptual foundations, two promising approaches based on the evaluation of serial correlations and complex network theory have been discussed. Although both concepts still need to systematically prove their capabilities and require further methodological improvements as highlighted in this chapter, they constitute promising new research avenues for future problems in climate change research, other fields of geosciences, and even complex systems sciences in general.