The Assumption of Non-gaussianity in Natural and Social Sciences and Its Influence on Detection of Causal Relationships

The concept of causality is changing as human knowledge changes. Causality as an abstract notion has been traditionally studied in the field of metaphysics in philosophy. The Greek philosophers understood the time causality as explanation in general (Aristotle, 350 B.C.). The search for causes was a search for "first principles", which were meant to be explanatory. In the more recent philosophy introduced by (Newton, 1687) was causality connected with determinism. The current experimental science reveals the non-deterministic notion of cause, which has to be also taken into consideration. The introduction of probability theory into all scientific disciplines allows to formalize and mathematize the wider conceived notion of cause.


Introduction
The concept of causality is changing as human knowledge changes.Causality as an abstract notion has been traditionally studied in the field of metaphysics in philosophy.The Greek philosophers understood the time causality as explanation in general (Aristotle,350 B.C.).The search for causes was a search for "first principles", which were meant to be explanatory.In the more recent philosophy introduced by (Newton, 1687) was causality connected with determinism.The current experimental science reveals the non-deterministic notion of cause, which has to be also taken into consideration.The introduction of probability theory into all scientific disciplines allows to formalize and mathematize the wider conceived notion of cause.
This paper does not deal with the philosophical approach to causality, to this we refer the reader for example to the works (Mackie, 1988), (Hume, 1896), (Russo, 2009) or (Pearl, 2000).
Here we deal exclusively with the formal mathematical approaches to detect cause-effect relationships, namely with Granger causality and transfer entropy and their application in sciences.
The generally non-deterministic approaches to causality apply various probability distributions to model the real-world phenomena.The selection of an appropriate or inappropriate model to fit the real world data has obviously an important influence on the credibility of the achieved conclusions.
In the present paper we discuss the influence of the selection of a data model for detection of causal relationships between two or more time series.We focus here on cases when the Gaussianity of the investigated process can be assumed and when not.The causality detection methods considered here are Granger causality (Granger, 1969) and transfer entropy (Schreiber, 2000).We investigated time series with a wider class probability distributions than Gaussian, the generalized Gaussian probability distributions.These distributions are given parametrically.We set conditions on their parameters so that one can from their values decide whether the relationships between the involved time series are unidirectional causal or whether no causality is present.
Being aware of outstanding philosophical papers on causality in the sciences, for example (Illari et al., 2011), we are though not aware of any similar publication on mathematically conceived causality and their application in natural and social sciences, neither of any analysis of the probabilistic assumptions about the investigated time series and their influence on causality detection by Granger causality or transfer entropy.
The paper is organized as follows.The application of various probabilistic models in natural and social sciences is discussed in Section 2. Granger causality and its application in natural and social sciences is treated in Section 3. Section 4 deals with transfer entropy, directionality index and other information-theoretical measures, inclusive their applications.Section 5 devotes a special attention to causal relationships among Gaussian time series and generalized Gaussian time series.In Section 6 we presented original and simple criteria for phenomena having generalized Gaussian distributions given parametrically which decide about a presence or absence of causality between these phenomena given by concrete time series.Section 7 discussed the criteria deciding about the applied causality detection method and concludes with the importance of the achieved results.
We presented original and simple criteria for phenomena having generalized Gaussian distributions given parametrically which decide about a presence or absence of causality between these phenomena given by concrete time series.

Probabilistic distributions and their application in natural and social sciences
Gaussian, more frequently called "normal" distribution has a special position among all probability distributions used in data modeling and is the most popular.It has been known for a relatively long time, is simple and analytically tractable.Its symmetry about its mean value is one of the basic principles realized in nature as well as in human culture.The bell shape of its graph makes the normal distribution attractive for modeling of real world data in many scientific or social disciplines.Indeed, many common natural or social phenomena show to have normal distribution.For example, such phenomena as women's height, Brownian motion of particles, milk production by cows and random deviations from target values in industrial processes fit a normal distribution (Limpert et al., 2001).However, many phenomena which fit normal distribution, have been shown that they fit also log-normal distribution or generalized normal distribution or, more precisely, fit it even better.What is the difference between normal and log-normal distribution?Both forms of variability are based on a variety of forces (causes) acting independently of one another.A major difference is however that the effects can be additive or multiplicative, thus leading to normal or log-normal distributions, respectively (Limpert et al., 2001).
The length of spoken words in phone conversation (Herdan, 1958), the length of sentences (Williams, 1940) have been shown to have log-normal distribution, as well as the age of marriage (Preston, 1981) or income (Statistical yearbook in Switzerland, 1997).Prices, incomes or populations, i.e. phenomena which grow exponentially, are often skewed to the right, and hence may be better modeled by other distributions than by the normal one, such as the log-normal distribution, Pareto distribution or skewed generalized normal distribution.Statistical inference using a normal distribution is not robust to the presence of outliers.When outliers are expected, data may be better described using a heavy-tailed distribution such as the Student's t-distribution.
Generalized normal distribution (generalized Gaussian distribution), first time mathematically defined in (Nadarajah, 1995) can model for example Brownian motion of particles or fractional Brownian motion with a better precision than the normal distribution (Zinde-Walsh & Phillips, 2003).Other experiments have shown a better approximation precision of the generalized Gaussian distributions than of the Gaussian distributions, for example (Sharifi & Leon-Garcia, 1995), (Moulin & Liu, 1999) for in image processing and video analysis, (Bicego et al., 2008) for EEG time series modeling.
The modeling in linguistics applies mostly Gaussian mixtures.Mixtures of generalized Gaussian distributions have been recently used in text independent speaker identification (Sailaja et al., 2010) and showed that it outperforms the earlier existing text independent speaker identification models.This model was applied for speaker identification like voice dialing, banking by telephone, telephone shopping information services etc.
Exponential distribution has been frequently used in modeling in astrophysics, for example the Weinman exponential distribution has been shown to be a good model for dusty galactic discs (Misiriotis et al., 2000).
To summarize, other probability distributions than the normal one have an important role in modeling both in natural and social sciences.We will call them non-Gaussian distributions in the following.The selection of a correct distributions for modeling natural or social phenomena is of great importance, especially when mutual interactions among these phenomena are investigated.A crucial question is whether there are causal relationships among the studied phenomena.This leads to a formal definition of causality and causal measures.
In the following chapters we define formally two causality detection measures, namely the Granger causality and transfer entropy.

Granger causality
The introduction of the concept of causality into the experimental science, namely into analyses of data observed in consecutive time instants (time series), is due to C.W.J. Granger in (Granger, 1969), the 2003 Nobel prize winner in economy.In his Nobel lecture (Granger, 2003) he recalled the inspiration by the Wiener's work and identified two components of the statement about causality: 1.The cause occurs before the effect; and 2. The cause contains information about the effect that is unique, and is in no other variable.
As Granger put it, a consequence of these statements is that the causal variable can help to forecast the effect variable after other data has been first used (Granger, 2003).This restricted sense of causality, referred to as Granger causality, G-causality thereafter, characterizes the extent to which a process X t is leading another process Y t , and builds upon the notion of incremental predictability.It is said that the process X t Granger causes another process Y t if future values of Y t can be better predicted using the past values of X t and Y t rather then only past values of Y t .The standard test of G-causality developed in (Granger, 1969) is based on a linear regression model.
In the following we will define Granger causality by using the notation from (Barnett, 2009).Let ⊕ denotes concatenation of vectors, so that for x = (x 1 , . . ., x d ) and y = (y 1 , . . ., y m )

79
The Assumption of Non-Gaussianity in Natural and Social Sciences and Its Influence on Detection of Causal Relationships x ⊕ y is the 1 × (d + m) vector (x 1 , . . ., x d , y 1 , . . ., y m ).Given jointly distributed multivariate random variables X and Y i.e. random vectors in R d , we denote by Σ(X) the d × d matrix of covariances cov(X i , Y j ) and by Σ(X, Y) the d × m matrix of cross-covariances cov(X i , Y α ).Let Σ(X|Y) denotes the d × d matrix define when Σ(Y) is invertible.
Suppose we have a stationary multivariate stochastic process X t in discrete time (i.e.marginal distributions are jointly distributed).Denote is a 1 × pd random vector for each t.Given the lag p, we use the shorthand notation t−1 for the lagged variable.Suppose we have three jointly distributed stationary multivariate stochastic processes X t , Y t , Z t .Consider the regression models where A and A are the matrices of regression coefficients, α t and α t are the constant terms and the random vectors and comprise the residuals, so that so that the predictee variable X is regressed firstly on the previous p lags of itself plus r lags of the conditioning variable Z and secondly, in addition, on q lags of the predictor variable Y.By stationarity this expression does not depend on time t, so we omit t from the notation.The G-causality of Y to X given Z is a measure of the extent to which inclusion of Y in the second model (3) reduces the prediction error of the first model (2).The standard measure of G-causality in the literature is defined for univariate predictor and predictee variables Y and X, and is given by the natural logarithm of the ratio of the residual variance in the restricted regression (2) to that of the unrestricted regression (3).(Barnett, 2009) have shown that G-causality can be expressed as where ln denotes the natural logarithm.

Extensions of Granger causality
The linear framework of Granger causality given by equations 2 and 3 has been widely applied not only in economy and finance (for a comprehensive survey of the literature see i.e. (Geweke, 1984)), but also in diverse fields of natural sciences, i.e. climatology (see (Triacca, 2005) and references therein) or neurophysiology, where specific problems of multichannel electroencephalogram recordings were solved by generalizing the Granger causality concept to multivariate case (Blinowska et al., 2004;Kami ński et. al., 2001).Nevertheless, the limitation of the present concept to linear relations required further generalizations.
Recent development in nonlinear dynamics (Abarbanel, 1993) evoked lively interactions between statistics and economy (econometrics) on one side, and physics and other natural sciences on the other side.In the field of economy, (Baek & Brock, 1992) and (Hiemstra & Jones, 1994) proposed a nonlinear extension of the Granger causality concept.Their non-parametric dependence estimator is based on so-called correlation integral, a probability distribution and entropy estimator, developed by physicists Grassberger and Procaccia in the field of nonlinear dynamics and deterministic chaos as a characterization tool of chaotic attractors (Grassberger & Procaccia, 1983).
Another non-linear extension of Granger causality is so called correntropy (Park & Principe, 2008).
A non-parametric approach to non-linear causality testing, based on non-parametric regression, was proposed in (Bell et al., 1996).Following (Hiemstra & Jones, 1994), (Aparicio & Escribano, 1998) succinctly suggested an information-theoretic definition of causality which include both linear and nonlinear dependence.
Another nonlinear extension of the Granger causality approach was proposed by Chen et al. (Chen et al., 2004) using local linear predictors.An important class of nonlinear predictors are based on so-called radial basis functions (Broomhead & Lowe, 1988) which were used for nonlinear parametric extension of the Granger causality concept (Ancona et al., 2004;Marinazzo, 2006).
In physics and nonlinear dynamics, a considerable interest recently emerged in studying cooperative behavior of coupled complex systems (Boccaletti et al., 2002;Pikovsky et al., 2001).Synchronization and related phenomena were observed not only in physical, but also in many biological systems, i.e. (Schäfer et al., 1998;1999) or in (Paluš et al., 2001a;b;Quyen et al., 1999;Schiff et al., 1996;Tass et al., 1998).In such physiological systems it is not only important to detect synchronized states, but also to identify drive-response relationships and thus the causality in evolution of the interacting (sub)systems.(Schiff et al., 1996) and (Quyen et al., 1999) used ideas similar to those of Granger, however, their cross-prediction models utilize zero-order nonlinear predictors based on mutual nearest neighbors.A careful comparison of these two papers (Quyen et al., 1999;Schiff et al., 1996) reveals how complex is the problem of inferring causality in nonlinear systems.While the latter two papers use the method of mutual nearest neighbors for mutual prediction, (Arnhold, 1999) proposed asymmetric dependence measures based on averaged relative distances of the (mutual) nearest neighbors.(Ge et al, 2009) presented a novel approach which is an extension of Granger causal model and also shares the features of the bilinear approximation of dynamic causal model (David et al., 2006).The authors demonstrated face discrimination learning-induced changes in interand intra-hemispheric connectivity and in the hemispheric predominance of theta and gamma frequency oscillations in sheep infero-temporal cortex.The results provide the first evidence for connectivity changes between and within left and right infero-temporal cortexes as a result of face recognition learning.

Application of Granger causality in natural and social sciences
As already said, the Granger causality was introduced by its author in econometry and applied by him and his followers mainly in econometry, finance and market analysis, for example in (Granger, 1969), (Poon & Granger, 2003).Other applications in humanities and social sciences are in linguistics and psychology (Gilbert & Karahalios, 2009) or demography (Feridun, 2007).
Granger causality has also been extensively applied in natural sciences, for example in medicine, especially to neuroscience (Smith et al., 2011) for the functional magnetic resonance method and (Hesse et al., 2003), analysis of EEG signals or causal interaction in neural populations (Seth & Edelman, 2007) and in many other papers.Granger causality was applied as well as in climatology (Kufmann & Stern, 1997), in cognitive and systematical biology, (Kim et al., 2011), (Fujita et al., 2010) etc.
The main drawback of Granger causality and its extensions as a model dependent method are their instability which can cause a high variability in the final estimation of errors in ( 2) and (3).As an alternative, we will present in the following model-free methods whose formal definitions apply information-theoretic functionals.

Information-theoretical causality measures
Using distributions of random processes and their definitions, introduce the information-theoretic causality measures determinism into the notion of causality.(Paluš et al., 2001b) proposed to study synchronization phenomena in experimental time series by using the tools of information theory.Mutual information, an information-theoretic functional of probability distribution functions, is a measure of general statistical dependence.
For inferring causal relation, conditional mutual information or so called transfer entropy can be used.

Transfer entropy
Transfer entropy as a non-linear causality measure was introduced in (Schreiber, 2000).It is an information-theoretic measure of time-directed information transfer between jointly dependent processes.
Let us first remind some basic definitions.The differential entropy of a (continuous) random vector X taking its values in R d with the probability density function p(x) is defined by If X is a discrete (multivariate) random variable given by a set of possible values {x 1 , . . ., x n } then the entropy can explicitly be written as where p denotes the probability mass function of X.With X t , Y t , Z t defined as before, the transfer entropy of Y to X given Z is defined as the difference between the entropy of X conditioned on its own past and the past of Z, and its entropy conditioned, in addition, on the past of Y: where H(.|.) is the conditional entropy.For stationary variables, similarly as for Granger causality, the transfer entropy does not depend on t, so we omitted it from labeling.Transfer entropy is a a Kullback-Leibler distance of transition probabilities.
It was shown in (Hlaváčková-Schindler et. al., 2007) that with proper conditioning, the transfer entropy is equivalent to the conditional mutual information (Paluš et al., 2001b).The latter, however, is a standard measure of information theory (Cover & Thomas, 1991).More details on the information-theoretic methods for causality detection can be found in our review paper (Hlaváčková-Schindler et. al., 2007).Marschinski and Kanz in 2002 suggested so called effective transfer entropy to reduce the bias of transfer entropy on small data sets (Marschinski & Kantz, 2002).

Transfer entropy, other information-theoretical measures and their application in natural and social sciences
Turning our attention back to econometrics, we can follow further development due to (Diks & DeGoede, 2001).They applied a nonparametric approach to nonlinear Granger causality using the concept of correlation integrals (Grassberger & Procaccia, 1983) and pointed out the connection between the correlation integrals and information theory.(Diks & Panchenko, 2005) critically discussed the previous tests of (Hiemstra & Jones, 1994).As the most recent development in economics, (Baghli, 2006) proposes information-theoretic statistics for a model-free characterization of causality, based on an evaluation of conditional entropy.
The information-theoretical approaches to causality detection are model free and can detect non-linear causal relationships, which are their advantages with respect to the approach of the linear Granger causality.
The nonlinear extension of the Granger causality based the information-theoretic formulation has found numerous applications in various fields of natural and social sciences.Let us mention just a few examples.
Other applications of the conditional mutual information in neurophysiology are due to (Hinrichs et. al., 2006) and (Pflieger & Greenblatt, 2005).
Causality or coupling directions in multimode laser dynamics is another diverse field where the conditional mutual information was applied (Otsuka et al., 2004).(Paluš & A. Stefanovska, 2003) adapted the conditional mutual information approach (Paluš et al., 2001b) to analysis of instantaneous phases of interacting oscillators and demonstrated suitability of this approach for analyzing causality in cardio-respiratory interaction (Paluš et al., 2001b).The later approach has also been applied in neurophysiology (Brea et al., 2006).
More recent applications of information-theoretical functionals in natural sciences (medicine) are for example in (Van Dijck et al., 2007), inferring and quantifying causality in neuronal networks (Chicharro et al., 2011), (Vicente et al., 2010), in the computer simulation of human-robot interaction in (Sumioka et al., 1997) or in the relationship of predator-prey in etiology (Bochmann, 2007).
The information-theoretical functionals applied in social sciences are mostly in financial applications: i.e. application of transfer entropy to the information flow between various

83
The Assumption of Non-Gaussianity in Natural and Social Sciences and Its Influence on Detection of Causal Relationships financial time series (Dimpfl et al., 2011) or analysis of the Korean stock market by transfer entropy (Baek et al., 2006).Applications of transfer entropy in linguistics can be found in the book (Baeyer, 2005).Social media (for example Twitter or Facebook) serves to researches as an important source for studying social interactions.One important problem is the characterization and identification of influentials, which can be defined as users who influence the behavior of large number of other users.To characterize influence (in other words a causal relationship) in Twitter, researchers have suggested number of followers, mentions, and retweets (Cha et al., 2010), and Pagerank of follower network (Kwak at al., 2010).(Ver Steeg et al., 2011) however argue that the purely structural measures of influence (causality) can be misleading (Ghosh & Lerman, 2010) and high popularity does not necessarily imply high influence (Romero et al., 2010;Ver Steeg et al., 2011).
More recent work has used the size of the cascade trees (Bakshy et al., 2011) and influence-passivity score (Romero et al., 2010).One serious drawback of existing methods is that they are based on explicit causal knowledge (i.e., A responds to B), whereas for many data sets such knowledge is not available.(Ver Steeg et al., 2011) suggest a model-free transfer entropy approach to detect causal relationships and identifying influential users based on their capacity to predict the behavior of other users.
Having reviewed the relevant literature and also after extensive practical experience, we can state that the information-theoretic approach to the Granger causality plays an important, if not a dominant role in analyses of causal relationships in nonlinear systems.
In the following we define a practical criterium for detection of causal relationships among time series by means of transfer entropy.

Directionality index
To measure causal structure on small data sets and to allow conclusions about the dominant direction of the information flow, the (causal) directionality index was defined for transfer entropy or conditional mutual information by Paluš in (Paluš & A. Stefanovska, 2003) and analogically in (Rosenblum & Pikovsky, 2001).It is given by where X, Y, Z are time series.Paluš et al. in Paluš & A. Stefanovska (2003) consider special cases for Z, the so called phase increments of X and Y: where The index varies between −1 and 1, where negative values imply that the information flow from X to Y dominates and positive vales indicate a large information flow from Y to X.
The definitions in the literature on the concrete subintervals are unfortunately not united.As well as there exist other modifications of the causal directionality index in the literature.We will use the following definitions.The case when the index equals to −1 or 1 respectively we call explicit unidirectional causality from Y to X or X to Y respectively.The case when the directional index is in interval < −1, 0) we call it a prevailing unidirectional causality from X to Y (and analogically when the directional index is in interval (0, 1 > we call it a prevailing unidirectional causality from Y to X).In case the directionality index equals to 0, we call it an absence of unidirectional causality.

Gaussianity and causal relationships
To avoid any misunderstanding, we assume that the investigated random processes are given by time series with a finite number of data (a discrete case).We assume that one knows the probability (density) distribution of the processes which the time series represent, and these we define for a continuous case by explicit analytical formulas.By terms a probability density function or probability distributions which will be used in the text we mean a probability density distribution function.(Barnett, 2009) recently proved that if all processes (time series) X, Y, Z defined by ( 4) are jointly Gaussian then Granger causality and transfer entropy are equivalent (up to a multiplication constant of 2).This result provides for the first time a unified framework for data-driven causal inference that bridges information-theoretic and autoregressive methods.
For practice it means that in the complexity sense cheaper linear test can be applied for detection of causality, when one knows the time series are Gaussian.
We investigated in our paper (Hlaváčková-Schindler, 2011) the question, to which other multivariate probability distributions of the time series can be the equivalence (up to a multiplicative constant) of the two causality measures extended.In the same paper we extended the equivalence of these two measures to the generalized normal distribution, to the log-normal distribution and Weinman exponential distribution.Since a lot of phenomena in nature and social areas have these distributions, have our results practical implications for studying causal relationships among those phenomena.
In the following we will further investigate generalized Gaussian distributions given parametrically and causal relationships among them.Let us recall their definition.

Generalized Gaussian distributions
Generalized Gaussian distributions which was defined in 1995 by Nadarajah in (Nadarajah, 1995) is a parametrical class of distributions containing all Gaussian and Laplacian distributions as special cases.
Generalized Gaussian density (GGD) is defined as where Γ(.) is the Gamma function Γ(z) = ∞ 0 e −t t z−1 dt, z > 0. The parameter α, the scale parameter, models the width of the pdf peak (standard deviation) and β, the shape parameter, is inversely proportional to the decreasing rate of the peak.
The generalized Gaussian distribution for β = 2 is the Gaussian distribution and for β = 1 the Laplacian distribution.

85
The Assumption of Non-Gaussianity in Natural and Social Sciences and Its Influence on Detection of Causal Relationships

Special subclasses of generalized Gaussians and their Kullback-Leibler divergence
Let us recall that the Kullback -Liebler distance (KLD) of two parametrical (general) probability density functions p(x, θ p ) and q(x, θ q ) is defined as where θ p and θ q are parameters of probability distributions p and q respectively and x ∈ R.
It can be easily shown that for two probability density functions which are generalized Gaussians p(x, α 1 , β 1 ), p(x, α 2 , β 2 ), the Kullback-Leibler divergence can be expressed as A subclass of generalized Gaussians where the shape parameter β is fixed is defined as The Kullback-Leibler divergence between two pdfs from P β (α) is For β = 1 it is a Laplacian distribution.In this case is the Kullback -Liebler distance between two Laplacian distributions For two Gaussian functions with parameter α 1 and α 2 and Transfer entropy can be in terms of Kullback-Leibler divergence rewritten as where and y (l) n = (y 1 n , . . ., y l n ).We will use the statements from Sections 5.1.and 5.2 in proofs of our results in Section 6.

Causality in time series with generalized Gaussian distributions: our results
The time series having generalized Gaussian distributions given parametrically allow to study and to express the relationships between them by means of their parameters.In the framework of definition of the directionality index introduced in section 4.3, we investigated the cases of the prevailing unidirectional causality, the explicit unidirectional causality and the absence of causality.
In the following we consider generalized Gaussian distributions given parametrically with the same shape parameter β.We give conditions on the relationships between the parameters of the involved probability distributions so that causal relationships between them can be detected by the directionality index.
Theorem 1. Assume that times series X, Y have generalized Gaussian distributions given parametrically by p X (x, there is a prevailing unidirectional causality from Y to X, if for the parameters of both distributions p X (x, α x , β) and p Y (y, α y , β) holds (ii) there is a prevailing unidirectional causality from X to Y, if for the parameters of both distributions p X (x, α x , β) and p Y (y, α y , β) holds (iii) there is an explicit unidirectional causality from Y to X, if for the parameters of both distributions p X (x, α x , β) and p Y (y, α y , β) holds β log( (iv) there is an explicit unidirectional causality from X to Y, if for the parameters of both distributions p X (x, α x , β) and p Y (y, α y , β) holds β log( (v) there is an absence of unidirectional causality between time series X and Y, if for the parameters of both distributions p X (x, α x , β) and p Y (y, α y , β) holds β log(

Causality detection between two Laplacian distributions
For the Laplacian distributions holds β = 1, which simplifies the conditions set in Theorem 1 for the remaining parameters.Theorem 1 ist then for the remaining parameters reformulated as follows.
Corollary 1. Assume that times series X, Y have Laplacian distributions given parametrically by p X (x, α x ) = 1 2α x e −(|x|/α x ) and p Y (y, α y ) = 1 2α y e −(|y|/α y ) , α x , α y > 0. Then (i) there is a prevailing unidirectional causality from Y to X, if for the parameters of both distributions holds (iii) there is an explicit unidirectional causality from Y to X, if for the parameters of both distributions holds log α x α α α y − 1. (iv) there is an explicit unidirectional causality from X to Y, if for the parameters of both distributions holds log (v) there is an absence of unidirectional causality between time series X and Y, if for the parameters of both distributions p X (x, α x ) and p Y (y, α y ) holds log α Proof: The items (i) -(v) can be proven by a direct application of the directionality index and the expression of transfer entropy by means of Kullback-Leibler divergence for Laplacian distributions.

Causality detection between two Gaussian distributions
For the Gaussian distributions holds β = 2 and Theorem 1 for the remaining parameters simplifies into the following corollary.
Corollary 2. Assume that times series X, Y have Gaussian distributions given parametrically by p X (x, α Then (i) there is a prevailing unidirectional causality from Y to X, if for the parameters of both distributions p X (x, α x ) and p Y (y, α y ) holds (ii) there is a prevailing unidirectional causality from X to Y, if for the parameters of both distributions holds (iii) there is an explicit unidirectional causality from Y to X, if for the parameters of both distributions holds log( (iv) there is an explicit unidirectional causality from X to Y, if for the parameters of both distributions holds log( (v) there is an absence of unidirectional causality between time series X and Y, if for the parameters of both distributions p X (x, α x ) and p Y (y, α y ) holds log α x α y = − In praxis, for time series given by finite number of observations, one can apply methods for finding parameters of corresponding generalized Gaussian distributions, which these data sets interpolate.Knowing these parameters, the statements in our theorem and corollaries provide simple decision criteria about a presence or absence of causality between two time series.

Conclusion
In this paper we presented linear and non-linear methods on causality detection among time series, namely Granger causality and transfer entropy, respectively.We discussed their applications both in natural and social sciences.For the purpose of selecting a method for causality detection among time series, the approach of using statistical hypothesis testing techniques, applied in G-causality has several difficulties.The issue of multiple testing which is being done sequentially in general does not provide an optimal solution.In addition, there is no objective guideline for the choice of the input of the individual test (i.e. the size of the time series interval) and it is unclear how would such a choice influence the detection of causality.The main drawback of model dependent methods for causality detection is their instability.As expected, with a small or moderate number of observations, models close to each other are often hard to distinguish and the values of the model selection criterion are usually close to each other.A small change on the data may result in a choice of a different hypothesis.As a consequence, the causality detection based on the selected hypothesis may have a high variability (ill-posedness of the problem).The main conceptual advantage of the Granger causality model over the information-theoretical approaches to causality detection is its linearity and its straightforward generalization for multivariate time series testing.The main conceptual drawback of Granger causality is its inability to detect eventual non-linear causal relationships.
For the reasons stated above, in this paper we focused on causality detection by transfer entropy.We dealt with the Gaussian and some non-Gaussian distributions of natural phenomena as well of phenomena occurring in social sciences.We presented original and simple criteria for finite time series having generalized Gaussian distributions given parametrically representing natural or social phenomena which decide about a presence or absence of causality between them.
Our results can considerably simplify current more computationally demanding data processing methods applied in natural and social sciences for detection of causal relationships.

References
Abarbanel
ii) there is a prevailing unidirectional causality from X to Y, if for the parameters of both distributions holds x ) 2 − ( α x α y ) 2 ] and ( α y α x ) 2 + ( α x α y ) 2 = 2 √ π .Proof: The items (i) -(v) can be proven by a direct application of the directionality index and the expression of transfer entropy by means of Kullback-Leibler divergence for Gaussian distributions.
, H.D.I. (1993) Introduction to Nonlinear Dynamics for Physicists.Lecture Notes in Physics, World Scientific, Singapore.