Stochastic Differential Equations and Related Inverse Problems

This research monograph presents a mathematical approach based on stochastic calculus which tackles the "cutting edge" in porous media science and engineering - prediction of dispersivity from covariance of hydraulic conductivity (velocity). The problem is of extreme importance for tracer analysis, for enhanced recovery by injection of miscible gases, etc. This book explains a generalised mathematical model and effective numerical methods that may highly impact the stochastic porous media hydrodynamics. The book starts with a general overview of the problem of scale dependence of the dispersion coefficient in porous media. Then a review of pertinent topics of stochastic calculus that would be useful in the modeling in the subsequent chapters is succinctly presented. The development of a generalised stochastic solute transport model for any given velocity covariance without resorting to Fickian assumptions from laboratory scale to field scale is discussed in detail. The mathematical approaches presented here may be useful for many other problems related to chemical dispersion in porous media.

Computational Modelling of Multi-Scale Non-Fickian Dispersion in Porous Media -An Approach Based on Stochastic Calculus 22 parameter t (for example, the probability may change with time, i.e. P(y  ,t). The collection of stochastic variables, Y t , is termed a stochastic process. The word 'process' suggests temporal development and is particularly appropriate when the parameter t has the meaning of time, but mathematically it is equally well used for any other parameter, usually assumed to be a real number in the interval [0,).
The label  is often explicitly included in writing the notation Y t (), for an individual value obtained from the set of Y-values at a fixed t. Conversely, we might keep  fixed, and let t vary; a natural notation would be to write Y  (t). In physical terms, one may think of this as the set of values obtained from a single experiment to observe the time development of the stochastic variable Y. When the experiment is repeated, a different set of observations are obtained; those may be labelled by a different value of . Each such sequence of observed Y-values is called a realization (or sometimes a path) of the stochastic process, and from this perspective  may be considered as labelling the realizations of the process. It is seen that it is somewhat arbitrary which of  and t is considered to be a label, and which is an independent variable; this is sometimes expressed by writing the stochastic process as Y(t,).
In standard calculus, we deal with differentiable functions which are continuous except perhaps in certain locations of the domain under consideration. To understand the continuity of the functions better we make use of the definitions of the limits. We call a function f, a continuous function at the point regardless of the direction t approaches t 0 . A right-continuous function at t 0 has a limiting value only when t approaches t 0 from the right direction, i.e. t is larger than t 0 in the vicinity of t 0. We will denote this as These statements imply that a continuous function is both right-continuous and leftcontinuous at a given point of t. Often we encounter functions having discontinuities; hence the need for the above definitions. To measure the size of a discontinuity, we define the term "jump" at any point t to be a discontinuity where the both f(t+) and f(t-) exist and the size of the jump be ( ) ( ) ( ) f t f t f t      . The jumps are the discontinuities of the first kind and any other discontinuity is called a discontinuity of the second kind. Obviously a function can only have countable number of jumps in a given range. From the mean value theorem in calculus, it can be shown that we can differentiate a function in a given interval only if the function is either continuous or has a discontinuity of the second kind during the interval. Stochastic calculus is the calculus dealing with often non-differentiable functions having jumps without discontinuities of the second kind. One such example of a function is the Wiener process (Brownian motion). One realization of the standard Wiener process is given in Figure 2.1. These statements imply that a continuous function is both right-continuous and left-continuous at a given point of t. Often we encounter functions having discontinuities; hence the need for the above definitions. To measure the size of a discontinuity, we define the term "jump" at any point t to be a discontinuity where the both f(t+) and f(t-) exist and the size of the jump be ( ) ( ) ( ) f t f t f t      . The jumps are the discontinuities of the first kind and any other discontinuity is called a discontinuity of the second kind. Obviously a function can only have countable number of jumps in a given range. From the mean value theorem in calculus, it can be shown that we can differentiate a function in a given interval only if the function is either continuous or has a discontinuity of the second kind during the interval. Stochastic calculus is the calculus dealing with often non-differentiable functions having jumps without discontinuities of the second kind. One such example of a function is the Wiener process (Brownian motion). One realization of the standard Wiener process is given in Figure 2.1. The increments of the function shown in Figure 2.1 are irregular and a derivative cannot be defined according to the mean value theorem. This is because of the fact that the function changes erratically within small intervals, however small that interval may be. Therefore we have to devise new mathematical tools that would be useful in dealing with these irregular, non-differentiable functions.
Variation of a function f on [a,b] is defined as If V f ([a,b]) is finite such as in continuous differentiable functions then f is called a function of finite variation on [a,b]. Variation of a function is a measure of the total change in the function value within the interval considered. An important result (Theorem 1.7 Klebaner (1998) of finite variation. This implies that a function of finite variation on [a,b] is differentiable on [a,b], and a corollary is that a function of infinite variation is non-differentiable. Another mathematical construct that plays a major role in stochastic calculus is the quadratic variation. In stochastic calculus, the quadratic variation of a function f over the interval [0,t] is given by  It can be proved that the quadratic variation of a continuous function with finite variation is zero. However, the functions having zero quadratic variation may have infinite variation such as zero energy processes (Klebaner, 1998). If a function or process has a finite positive quadratic variation within an interval, then its variation is infinite, and therefore the function is continuous but not differentiable.
Variation and quadratic variation of a function are very important tools in the development of stochastic calculus, even though we do not use quadratic variation in standard calculus.
We also define quadratic covariation of functions f and g on [0,t] by extending equation shown that if both the functions are continuous and one is of finite variation, the quadratic covariation is zero.
Quadratic covariation of two functions, f and g, has the following properties:

Polarization identity
Polarization identity expresses the quadratic covariation, [f,g](t) , in terms of quadratic variation of individual functions. In many situations where stochastic processes are involved, we would like to know the limiting values of useful random variables, i.e. whether they approach a some sort of a "steady state" or asymptotic behaviour. It is natural to define the steady state of random variable within a probabilistic context. Therefore, in stochastic processes, we define the convergence of random variables using the following four different criteria:

Convergence in probability
{X n } converges to {X} with zero probability of having a difference between the two processes: Convergence in probability is called stochastic convergence as well.
Note that we adopt the notation of E( , ) or E[ , ] to denote the expected value (mean value) of a stochastic variable. In physical literature, this is denoted by "< , >".

Convergence in distribution
Distribution function of {X n } converges to that of {X} at any point of continuity of the limiting distribution (i.e. the distribution function of {X}).
These four criteria add another dimension to our discussion of the asymptotic behaviour of a process. These arguments can be extended in comparing stochastic processes with each other.
www.intechopen.com Unlike in deterministic variables where any asymptotic behaviour can clearly be identified either graphically or numerically, stochastic variables do require adherence to one of the convergence criteria mentioned above which are called the "criteria for strong convergence". There are weakly converging stochastic processes and we do not discuss the weak convergence criteria as they are not relevant to the development of the material in this book.
In standard calculus we have continuous functions with discontinuities at finitely many points and we integrate them using the definition of Riemann integral of a function f (t) over the interval [a,b]: with the same definitions as above for  and n i t 's. It can be shown that for the Stieltjes integral to exist for any continuous function f(t), g(t) must be a function with finite variation on [a,b]. This means that if g(t) has infinite variation on [a,b] then for such a function, integration has to be defined differently. This is the case in the integration of the continuous stochastic processes, therefore, can not be integrated using Stieltjes integral. Before we discuss alternative forms of integration that can be applied to the functions of positive quadratic variation, i.e. the functions of infinite variation, we introduce a fundamentally important stochastic process, the Wiener process and its properties.

Wiener Process
The botanist Robert Brown, first observed that pollen grains suspended in liquid, undergo irregular motion. Centuries later, it was realised that the physical explanation of this is that the pollen grain is continually bombarded by molecules of the liquid travelling with different speeds in different directions. Over a time scale that is large compared with the intervals between molecular impacts, these will average out and no net force is exerted on the grain. However, this will not happen over a small time interval; and if the mass of the grain is small enough to undergo appreciable displacement in the small time interval as the result of molecular impacts, an observable erratic motion results. The crucial point to notice in the present context is that while the impacts and therefore the individual displacements suffered by the grain can be considered independent at different times, the actual position of the grain can only change continuously.
In the physical Brownian motion, there are small but nevertheless finite intervals between the impulses of molecules colliding with the pollen grain. Consequently, the path that the grain follows, consists of a sequence of straight segments forming an irregular but continuous line -a so-called random walk. Each straight segment can be considered an increment of the momentary position of the grain.
The mathematical idealisation of the Brownian motion let the interval between increments approach zero. The resulting process -called the Wiener process due to N. Wiener -is difficult to conceptualise: for example, consideration shows that the resulting position is everywhere continuous, but nowhere differentiable. This means that while the particle has a position at any moment, and since this position is changing it is moving, yet no velocity can be defined. Nevertheless as discussed by Stroock and Varadhan (1979) a consistent mathematical description is obtained by defining the position as a stochastic process B(t,) with the following properties that are suggested as a mathematical model for the Brownian motion-a Wiener process: P1: B(0,) = 0 , i.e. choose the position of the particle at the arbitrarily chosen initial time t = 0 as the coordinate origin; P7: The covariance of Brownian motion is determined by a correlation between the values of B(t,) at times t i and t j (for fixed ), given by When applied to t i = t j = t, P7 reduces to the statement that where 'Var' means statistical variance. For the Brownian motion this can be interpreted as the statement that the radius within which the particle can be found increases proportional to time.
Because the Wiener process is defined by the independence of its increments, it is for some purposes convenient to reformulate the variance of a Wiener process in terms of the variance of the increments: www.intechopen.com where t is defined to mean the time increment for a fixed realization .
The connection between the two formulations is established by similarly rewriting equation .

Further Properties of Wiener Process and their Relationships
Consider a stochastic process ( , ) X t  having a stationary joint probability distribution and ( ( , )) E X t   0, i.e. the mean value of the process is zero. The Fourier transform of ( ( , )) Var X t  can be written as, S   is called the spectral density of the process ( , ) X t  and is also a function of angular frequency  . The inverse of the Fourier transform is given by Therefore variance of (0, ) X  is the area under a graph of spectral density ( , )

Spectral density ( , )
S   is considered as the "average power" per unit frequency at  which gives rise to the variance of ( , ) X t  at = 0  . If the average power is a constant, the power is distributed uniformly across the frequency spectrum, such as the case for white light, then ( , ) X t  is called white noise. White noise is often used to model independent random disturbances in engineering systems, and the increments of Wiener process have the same characteristics as white noise. Therefore white noise ( ( )) t  is defined as We will use this relationship to formulate stochastic differential equations.
As shown before, the relationships among the properties mentioned above can be derived starting from P1 to P7. For example, let us evaluate the covariance of Wiener processes, ( , ) i B t  and ( , ) Assuming i j t t  , we can express: (2.3.5) Therefore, are independent processes and therefore we can write , ( ( , )( ( , ) ( , ))) ( ( , )) ( ( , ) ( , )) According to P3 and P5, ( ( , )) 0 j E B t   and ( ( , ) ( , )) 0 Therefore, from equation (2.3.7) ( ( , ) ( , ) ( , ))) 0 This leads equation (2.3.6) to 2 ( ( , ) ( , )) ( ( , )), Using a similar approach it can be shown that if i j t t  , (2.3.9) This leads to P7: ( ( , ) ( , )) min( , ) The above derivations show the relatedness of the variance of an independent increment, can be used to construct Wiener process paths on computer. If we divide the time interval [0, ] t into n equidistant parts having length t  , and at the end of each segment we can randomly generate a Brownian increment using the Normal distribution with mean 0 and variance t  . This increment is simply added to the value of Wiener process at the point considered and move on to the next point. When we repeat this procedure starting from t t   to t=t and taking the fact that (0, ) 0 B   into account, we can generate a realization of Wiener process. We can expect these Wiener process realizations to have properties quite distinct from other continuous functions of t. We will briefly discuss some important characteristics of Wiener process realizations next as these results enable us to utilise this very useful stochastic process effectively.
Some useful characteristics of Wiener process realizations   Using the definition of covariation of functions, where 1 max( ) Taking the expectation of the summation, 2 2 1 1 ( ( ( ) ( )) ) ( (( ( ) ( )) )) As seen before, Therefore, Let us take the variance of 2 1 Summarizing the results, 2 1 ( ( ( ) ( )) ) , This implies that, according to the monotone convergence theories that 2 1 Therefore, the quadratic variation of Brownian motion ( , ) B t  is t: Omitting t and  , [ , ]( ) We do not give the proof of these martingale characteristics of Brownian motion here but it is easy to show It can also be shown that are also martingales.
These martingales can be used to characterize the Wiener process as well and more details can be found in Klebaner (1998).

Wiener process has Markov property
Markov property simply states that the future of a process depends only on the present state. In other words, a stochastic process having Markov property does not "remember" the past and the present state contains all the information required to drive the process into the future states.
This can be expressed as Converging almost surely.
From the very definition of increments of the Wiener process for the discretized intervals of , the Wiener process increment behaves independently to its immediate www.intechopen.com One can now see intuitively why Wiener process should behave as a Markov process. This can be expressed as which is another way of expressing the previous equation (2.3.14).

Generalized form of Wiener process
The Wiener process as defined above is sometimes called the standard Wiener process, to distinguish it from that obtained by the following generalization: The integral kernel q() is called the correlation function and determines the correlation between stochastic process values at different times. The standard Wiener process is the simple case that q()=1 , i.e. full correlation over any time interval; the generalised Wiener process includes, for example, the case that q decreases, and there is progressively less correlation between the values in a given realization as the time interval between them increases.

Stochastic Integration
At this point of our discussion, we need to define the integration of stochastic process with respect to the Wiener process ( ( , )) B t  so that we understand the conditions under which this integral exists and what kind of processes can be integrated using this integral. The Stieltjes integral can not used to integrate the functions of infinite variation, and therefore, there is a need to define the integrals for the stochastic process such as the Wiener process. There are two choices available: Ito definition of integration and Stratanovich integration. These two definitions produce entirely different integral stochastic process.
The Ito definition is popular among mathematicians and physicists tend to use the Stratanovich integral. The Ito integral has the martingale property among many other useful technical properties (Keizw, 1987), and in addition, the Stratanovich integrals can be reduced to Ito integrals (Klebaner, 1998). In this monograph, we confine ourselves to Ito definition of integration: Stochastic Differential Equations and Related Inverse Problems 33 constant c. If X(t) is a deterministic process, which can be expressed as a sequence of constants over small intervals, we can define Ito integral as follows: The time interval [S,T] has been discretized into n intervals : Using the property of independent increments of Brownian motion, we can show that the mean of [ ]( ) I X  is zero and, It turns out that if ( , ) X t  is a continuous stochastic process and its future values are solely dependent on the information of this process only up to t, Ito integral [ ]( ) I X  exists. The future states on a stochastic process, ( , ) X t  , is only dependent on F t then it is called an adapted process. A left-continuous adapted process ( , ) X t  is defined as a predictable process and it satisfies the following condition: As we are only concerned about continuous processes driven by the past events, we limit our discussion of predictable processes. Reader may want to refer to ksendal (1998) and Klebaner (1998) for more rigorous discussion of these concepts.
We can now define Ito integral [ ]( ) X t  is a continuous and adapted process then [ ]( ) I X  can be defined as and this sum converges in probability.
Dropping  for convenience and adhering to the same discretization of interval [S, T] as in (

book. As stated earlier [ ]( )
I X  is a stochastic process and it has the following properties (see, for example, ksendal (1998) for more details):

Linearity
If X(t) and Y(t) are predictable processes and  and  as some constants, then (2.4.7) The isometry property says that the expected value of the square of Ito integral is the integral with respect to t of the expectation of the square of the process X (t).
from zero mean property, we can express the left hand side of Therefore the variance of Ito integral process is 2 ( ( )) T s E X t dt  and this result will be useful to us in understanding the behaviour of Ito integral process. We say that Ito integral is square integrable. According to Fubuni's Theorem, which states that, for a stochastic process X(t) , with continuous realizations, (2.4.10)

Ito integral is a martingale
It can be shown that ( for martingale property to be true. Therefore, Ito integrals are square integrable martingales.

www.intechopen.com
Stochastic Differential Equations and Related Inverse Problems 35 5. Ito integral of a deterministic function X(t) is a Guassian process with zero mean and covariance function, We see that Ito integral has a positive quadratic variation making it a process with infinite variation i.e., it is a nondifferentiable continuous function of t.
7. Quadratic covariation of Ito integral with respect to processes 1 ( ) X t and 2 ( ) X t is given by Armed with these properties we can proceed to discuss the machinery of stochastic calculus such as stochastic chain rule, which is also known as Ito formula.

Stochastic Chain Rule (Ito Formula)
As we have seen previously, the quadratic variations of Brownian motion, [B(t,  ), B(t,  )](t), is the limit in probability over the interval [0,t ]:  . Using the differential notation, , and summation as We have shown that [B,B] (t) = t, and therefore, For our convenience and also to make the notation similar to the one in standard differential calculus, we denote ( Similarly for any other continuous function g (t ), This equation is an expression of the approximation, converging in probability, of (2.5.7) As the quadratic variation of a continuous and differentiable function is zero, This equation in integral notation, 2 0 ( ) 0 t dt   , and in differential notation, 2 ( ) 0 dt = .
(2.5.9) Similarly, quadratic covariation of t (a continuous and differentiable function) and Brownian notion, This relationship can be proved by expressing quadratic covariation as Hence, [ , ]( ) 0 t B t  and in integral notation, 0 0 t dt dB   . This can be written in differential notation, (2.5.11) www.intechopen.com Stochastic Differential Equations and Related Inverse Problems 37 Therefore, we can summarize the following rules in differential notation as follows, . 0 dt dB  ; . 0, dB dt  and . dB dB dt  . (2.5.12) In order to come to grips with the interpretation of the differential properties of dB t , it is useful to consider the chain rule of differentiation. This will also lead us to formulas that are often more useful in calculating Ito integrals than the basic definition as the limit of a sum. Consider first the case in ordinary calculus of a function g(x,t), where x is also a function of t.
We can write the change in g as t changes, as follows: From this, an expression for dg/dt is obtained by taking the limit t  0 of the ratio (g/t).
Since x = (dx/dt) t, when t  0 the 2nd derivative term shown is of order (t) 2 and falls away together with all higher derivatives, and the well-known chain rule formula for the total derivative (dg/dt) is obtained. However, if , instead of x , we have a Wiener process B t , we get If the expectation value of this expression over all realizations is taken, the above shows that the second derivative term is now only of order t and cannot be ignored. Since this holds for the expectation value, for consistency we also cannot neglect the term if the limit t  0 is taken without considering the expectation value. Unlike the case of ordinary calculus where all expressions containing products of differentials higher than 1 is neglected, in Ito calculus we therefore have different rules.
Recall that in standard calculus chain rule is applied to composite functions.
Then . dg dg dY dt dY dt  In differential notation,

Suppose say f(t) =B(t) (Brownian motion) and g(x)
is twice continuously differentiable function. Then by using stochastic Taylor series expansion, Comparing equation (2.5.13) and the corresponding stochastic chain rule, we can see that the second derivative term of the Taylor series plays a significant role in changing the chain rule in the standard calculus to the stochastic one. (2.5.14) In differential notation (which is only a convention), d e e dB t e dt   . (2.5.15) As an another example, let Therefore, from the chain rule 2 2 0 1 ( ( )) ( (0) (2.5.16) This is quite a different result from the standard integration. In differential convention, In other words, the stochastic process 0 ( ) ( ) t B s dB s  can be calculated by evaluating . We will show how this process behaves using computer simulations in section 2.6.
We can write Ito integral as (2.5.18) Then we can add a "drift term" to the "diffusion term" given by equation (2.5.18): We recall that ( ) s  should be a predictable process and is subjected to the condition the drifting of the process. Y(t) is called an Ito process and in differential notation we can write, (2.5.20) Equation (2.5.20) can be quite useful in practical applications where the main driving force is perturbed by an irregular noise. A particle moving through a porous medium is such an example. In this case, advection gives rise to the drift term and hydrodynamic dispersion and microdiffusion give rise to the "diffusive" term. In the population dynamics, the diffusive term is a direct result of noise in the proportionality constant. Therefore it is important to study Ito process further in order to apply it in modeling situations. (  is a continuous function with finite variation and using quadratic variation of Ito integral. In differential notation, The chain rule given in equation (2.5.12) gives us a way to compute the behaviour of a function of Brownian motion. It is also useful to know the chain rule to compute a function of a given Ito process. Suppose an Ito process is given by a general form, where  is the drift coefficient and  is the diffusion coefficient and let g(t, x) is a twice differentiable continuous function. Let Y (t) = g(t , X (t)). Here Y (t) is a function of t and Ito process X (t), and is also a stochastic process. Y (t) can also be expressed as an Ito process. Then Ito formula states, and is evaluated according to the rules given by equation (2.5.12).
As an example, consider the Ito process (2.5.29) Substituting to Ito formula above , As seen above 2 ( ) dX t is also an Ito process with function of X(t) and B (t), and v = 4X(t)B(t) (diffusion coefficient) , also a function of X(t) and B(t). (2.5.32) We can derive this from chain rule for a function of B(t) as well.
By convention, the above stochastic differential is given by the following integral equation: We can show that This idea of having a solution to a stochastic differential is similar to having a solution to differential equations in standard calculus.
Suppose 1 ( ) X t and 2 ( ) X t are Ito processes given by the following differentials: (2.5.40) Quadratic covariation is given by (2.5.41) The stochastic product rule is given by, Stochastic product rule can be expressed in differential form: As an example, consider Y(t) = t B(t), where 1 ( ) X t t  , a continuous function with finite variation and 1 0   , and 2 ( ) ( ) Brownian motion with infinite variation and 2 1   .
From the product rule, This is the same result we obtain if we use the standard product rule. The reason for this is that quadratic covariation of a continuous function and a function with infinite variation is zero as we have mentioned previously.

t tdB t B t dt tX t dt X t X t X t B t tX t dt X t X t tX t dB t
(2.5.47) As an integral equation, ( ( ), ( )) g X t X t is also an Ito process and given by the following differential form: Using quadratic variation and covariation of Ito processes, These can be considered as a generalization of the rules on differentials given by equation (2.5.12). We use this generalized Ito formula for a function of two Ito processes in the following example.
We will express the stochastic process can be considered as a solution process to the stochastic differential, dX t e dt e dB t    .
As we can see in the above solution, the solution process contains the characteristics of both the drift and diffusion phenomena. In this case, diffusion phenomenon dominates as t increases because of the expected value of the exponential of Brownian motion increases at a faster rate in general. If we examine the drift term of the stochastic differential above, we see that the drift term is also affected by the Brownian motion, so the final solution is always a result of complex interactions between the drift term and the diffusion term.
(2.5.52) X(t) process, therefore, satisfies the Ito process, and equation (2.5.52) can be considered as a solution to the stochastic differential equation.
As discussed earlier, this solution significantly different from its deterministic counterpart.
This section we have reviewed the essentials of stochastic calculus and presented the results which could be useful in developing models and solving stochastic differential equations. While analytical expressions are quite helpful to understand stochastic processes, computer simulation provides us with an intuitive "feel" for the simulated phenomena. Sometimes it is revealing to simulate a number of realizations of a process and visualize them on computers to understand the behaviours of the process.

Computer Simulation of Brownian Motion and Ito Processes
In the previous section, we have introduced Brownian motion (the Wiener process) as a stationary, continuous stochastic process with independent increments. This process is a unique one to model the irregular noise such as Gaussian white noise in systems, and once such a process is incorporated in differential equations, the process of obtaining solutions involve stochastic calculus. Only a limited number of stochastic differential equations have analytical solutions and some of these equations are given by Kloeden and Platen (1992). In many instances we have to resort to numerical methods. We illustrate the behaviour of the Wiener process and Ito processes through computer simulations so that reader can appreciate the variable nature of individual realizations.
For the numerical implementation, it is most convenient to use the variance specification of the Wiener process B(t). The time span of the simulation, [0,1] is discretised into small equal time increments delt, and the corresponding independent Wiener increments selected randomly from a normal distribution with zero mean and variance, delt. The Wiener process is very irregular (Figure 2.1), and the only discernible pattern is that as time progresses, the position tends to wander away from the starting position at the origin.
In other words, if the statistical variance over realisations for a fixed time is evaluated, this www.intechopen.com Stochastic Differential Equations and Related Inverse Problems 47 increases gradually -a property referred to as time varying variance. The use of the Wiener process in a modelling situation to represent the noise in the system should be carefully thought through. If the noise can be represented as white noise, then Wiener process enters into the equation because of the relationship between the white noise and the Wiener process. It is also important to realize that the Ito integral is a stochastic process dependent on the Wiener process. This is analogous to integration in standard calculus because an indefinite integral is a function of the independent, deterministic variable. Given the Wiener process realization depicted in Figure 2.1, we compute the Ito integral of Wiener process, 0 ( , ) t B t dB   .
As we have previously seen, this integral can be evaluated by using the following stochastic relationship converging in probability; and it is shown in Figure 2.4. Let us consider the following Ito process which we have derived in section 2.5. In differential notation, (2.6.1) The Ito process given in equation (2.6.1) is simulated in Figure 2.6 for the Wiener realization depicted in Figure 2.5.

Solving Stochastic Differential Equation
Let us consider an ordinary differential equation which relates the derivative of the dependent variable (y(t)) to the independent variable (t) through a function, ( ( ) .4), y becomes a stochastic process, ( , ) Y t  , and in the differential notation SDE is written as This actually means, (2.7.8) If we can find a function of Wiener process in the form of an Ito process that satisfies the above integral equation (2.7.8), we call that function a strong solution of SDE.
Strong solutions do not depend on individual realizations of Brownian motion. In other words, all possible realizations of an Ito process, which is a strong solution of a SDE, satisfy the SDE under consideration. Not all the SDEs have strong solutions. Other class of solutions is called weak solutions where solution to each individual realization is different from each other. In this section we will focus only on strong solutions. In many situations, finding analytical solutions to SDEs is impossible and therefore we will review a minimum number of SDEs and their solutions in order to facilitate the discussion in the subsequent chapters.
If X (t) is a stochastic process and another stochastic process Y(t) is related to X(t) through the stochastic differential, (2.7.9) Thus Y(t ) is called the stochastic exponential of X (t). If X (t) is a stochastic process of finite variation thus the solution to equation (2.7.9) is, (2.7.10) and, for any process X(t), Y t e   satisfies the stochastic differential given above when (2.7.11) [X, X](t) is quadratic variation of X (t) and for a continuous function with finite variation [X,X](t) = 0.
For example, consider the following stochastic differential equation in differential form, (2.7.12) This SDE does not have a drift term and the diffusion term is an Ito integral.

We know, [B, B](t) = t.
Therefore from the above result, Then the solution to the SDE is (2.7.14) www.intechopen.com Computational Modelling of Multi-Scale Non-Fickian Dispersion in Porous Media -An Approach Based on Stochastic Calculus 52 Now let us consider a similar SDE with a drift term: (2.7.15) where  and  are constants.
Dividing it by X (t), (2.7.16) This differential represents, The second term on the right hand side comes from Ito integration.

Now let us assume
(2.7.18) Then the SDE becomes, Then the solution to the SDE is Let us examine whether the stochastic process Stochastic Differential Equations and Related Inverse Problems 53 is a strong solution to the differential equation We will define a function, We need to apply Ito formula for the two Ito processes 1 ( ) X t and 2 ( ) X t .
The integral in the solution given above is an Ito integral and should be calculated according Ito integration. For non-linear stochastic differential equations, appropriate substitutions may be found to reduce them to linear ones.

The Estimation of Parameters for Stochastic Differential Equations Using Neural Networks
Stochastic differential equations (SDEs) offer an attractive way of modelling the random system dynamics, but the estimation of the drift and diffusion coefficients remains a challenging problem in many situations. There are various statistical methods that are used to estimate the parameters in differential equations driven by Wiener processes. In this section we offer an alternative approach based on artificial neural networks to estimate the parameters in a SDE. Readers who are familiar with neural networks may skip this section. Artificial Neural Networks (ANNs) as discussed in chapter 1 are universal function approximators that can map any nonlinear function, and they have been used in a variety of fields, such as prediction, pattern recognition, classification and forecasting. ANNs are less sensitive to error term assumptions and they can tolerate noise and chaotic behaviour better than most other methods. Other advantages include greater fault tolerance, robustness and adaptability due to ANNs' large number of interconnected processing elements that can be trained to learn new patterns (Bishop, 1995). The Multilayer Perceptron (MLP) network is among the most common ANN architecture in use. It is one type of feed forward networks wherein the connections are only allowed from the nodes in layer i to the nodes in layer i+1.
There are other more complex neural network architectures available, such as recurrent networks and stochastic networks; however MLP networks are always sufficient for dealing with most of the recognition and classification problems if enough hidden layers and hidden neurons are used (Samarasinghe, 2006). We show how to use the output values from the SDE solutions of the equations to train neural networks, and use the trained networks to estimate the SDE parameters for given output data. MLP networks will be used to solve this type of mapping problem.
The general form of SDE can be expressed by where ( ) y t = the state variable of interest,  = a set of parameters (known and unknown), and ( ) w t = a standard Wiener process. In practice, to determine the parameter  , the system output variable y is usually observed at discrete time intervals, t, where 0 t T   , at M independent points: Rumelhart (1986) developed the backpropagation learning algorithm and it is commonly used to train ANNs due to its advanced ability to generalize wider variety of problems. A typical backpropagation learning algorithm is based on an architecture that contains a layer of input neurons, output neurons, and one or more hidden layers; these neurons are all interconnected with different weights. In the backpropagation training algorithm, the error information is passed backward from the output layer to the input layer (Figure 2.8).
Weights are adjusted with the gradient descent method.
The ANN is trained by first setting up the network with all its units and connections, and then initialising with arbitrary weights. Then the network is trained by presenting examples. During the training phase the weights on connections change enabling the network to learn. When the network performs well on all training examples it should be validated on some other examples that it has not seen before. If the network can produce reasonable output values which are similar to validation targets and contain only small errors, it is then considered to be ready to use for problem solving.
where α, β = constant coefficients to be estimated as parameters, and  = a constant coefficient to adjust the noise level (amplitude).
For each particular parameter α or the combination of parameters α and β, we can generate  [0.5, 3] in the nonlinear case.
The discrete observations ( ) X t of these two equations are obtained at the sampling instants. Suppose the number of samples to be n t , we consider the first n t time steps starting from 0 1 X  and the size of sampling interval t  = 0.001. All the values come from the solution of SDEs. It has been shown that using Ito formula, Eq. (2.8.2) has an analytical solution (section 2.7), ( Before we describe the neural networks data sets, we clarify the terminology about "training", "validation" and "test" data sets. In the literature of machine learning and neural networks communities, different meaning of the words "validation" and "test" are employed. We restrict ourselves to the definitions given by Ripley (1996): a training set is used for learning, a validation set is used to tune the network parameters and a test set is used only to assess the performance of the network.
We generate a number of SDE realisations for a specified range of parameters with some patterns of Wiener processes to train the ANN. These data sets are called training data sets and validation sets are randomly chosen from the training sets. In order to test the prediction capability of the ANN, test data sets are generated with different patterns of Wiener processes within the same range of parameters as the training data sets.
Obviously if the test data sets were generated from SDEs which contain only a single Wiener process, the result would be biased if this Wiener process was coincidently similar to the one used to generate the training data sets. To fairly assess the performance of networks, five different patterns of Wiener processes are used to generate the test data sets.
To determine the value of time step n t , we have taken different n t values, where min n t = 10 to max n t = 200 and n t  = 10, to generate the training and test data sets. We found that 50 values were sufficient to represent the pattern of SDEs in order to train neural networks. Further increase in n t did not increase the neural networks performance in parameter estimation. Therefore 50 time steps are used in our computational experiments.
All the experiments are carried out on a personal computer running the Microsoft Windows XP operating system. We use a commercial ANN software, namely NeuroShell2, for the neural network computations. It is recommended for academic users only, or those users who are concerned with classic neural network paradigms like backpropagation. Users interested in solving real problems should consider the NeuroShell Predictor, NeuroShell Classifier, or the NeuroShell Trader (Group, W.S., 2005). Among all the parameters in MLP, the numbers of input and output neurons are the easiest parameters to be determined because each independent variable is represented by its own input neuron. The number of inputs is determined by the number of sampling instants in the SDE's solution, and the number of outputs is determined by the number of parameters which need to be predicted. In terms of the number of hidden layers and hidden layer neurons, we try a network that started with one layer and a few neurons, and then test different hidden layers and neurons to achieve the best ANN performance. In the following experiments, (X -Y -Z) is used to denote to the networks, where X is the number of input nodes, Y is the number of hidden nodes and Z is the number of output nodes.
We found that the logistic function was always superior to other five transfer functions used in NeuroShell2, logistic, linear, hyperbolic tangent function, Sine and Gaussian, as input, output and hidden layer functions because of its nonlinear and continuously differentiable properties, which are desirable for learning complex problems. In addition to the logistic function, we use the default values of 0.1 in NeuroShell2 for both learning rate and momentum as we found that it was always appropriate.
The number of training epochs plays an important role in determining the performance of the ANN. An epoch is the presentation of the entire training going through the network. ANNs need sufficient training epochs to learn complex input-output relationships. However excessive training epochs require unnecessarily long training time and cause over fitting problems where the network performs during the training very well but fails in testing (Caruana, 2001). To monitor the over fitting problem, we set up 20% of the training sets as validation sets and and the ANN monitors errors on the validation sets during training. The profile of the error plot for the training and validation sets during the training procedure indicates whether further training epoch is needed. We can stop training when the error of the training set plot keeps decreasing but that of the validation set plot has an increasing or flat line at the end.
In order to test the robustness of neural networks, we need to measure the level of noise in the diffusion term of a SDE with respect to its drift term. Thus the diffusion parameter γ is used to adjust the noise level. The higher γ value indicates greater noise and increases the influence of the contribution of the diffusion term to the entire solution. As one can assume, the increased noise levels raises the difficulty of estimation. To measure it, we define P  for linear equation ( where y = actual value, ŷ = predicted value of y, y = mean of y, and m = number of data patterns. In the case of parameter estimation for the linear SDE, one 2 R value is obtained for determining the accuracy of the predicted parameter α. For the nonlinear SDE, two 2 R values are calculated for determining the accuracy of the predicted parameters α and β. If the ANN predicts all the values correctly as the actual values, a perfect fit would result in an 2 R value of 1. In a very poor fit, the 2 R value would be close to 0. If ANN predictions are worse than the mean of samples, the 2 R value will be less than 0.
In addition to 2 R , the Average Absolute Percentage Error (AAPE) is also used for evaluating the prediction performance where needed (Triola, 2004 where y = target value, ŷ = predicted value of y, and m = number of data patterns.
The performance of ANN is evaluated by assessing the accuracy of the estimated parameters. Different ANN architectures including various combinations of hidden layers, neurons, and training epochs are used to obtain the optimum neural network. Further, a range of diffusion term is used to evaluate the effect of different level of stochasticity on the performance of ANN.
We do not have a priori knowledge of the optimal ANN architecture at first; therefore we choose the default parameters in NeuroShell2 for one hidden layer MLP network, which has www.intechopen.com The experiments show that when training data set is developed using only one Wiener process, over fitting problem is obvious. The average error in the training set continues to decrease during training process. Because of the powerful mapping capability of neural networks, the average error between the target and network outputs approaches zero as the training continues. During the first four epochs, the average error in the test set drops significantly. It reaches the lowest at the epoch 8. After that, the validation set error starts rising although the training set error is getting smaller. The reason for this increasingly poor generalization is that the neural network tends to track every individual point in the training set created by a particular pattern of Wiener process, rather than seeing the whole character of the equation.
When the training data set is produced by more than one Wiener process, over fitting significantly decreases. The average error in the validation set continues to drop and remains stable after certain epochs. We examine the relationship between the number of Wiener processes and ANN prediction ability. The same ANN architecture and SDE parameters as the previous section are used here. Additionally we test a set of noisier data with  = 0.07. The results are obtained with the same numbers of training epochs. Figure 2.9 shows the influence of the number of Wiener processes that are used to produce training data sets. It indicates that as the number of Wiener Processes in the training sets increases, the network produces higher 2 R values for the test sets. It should be noted that the size of training data set expands as more Wiener processes are employed, and consequently the expansion causes slower training. Therefore, although there is a marginal improvement on 2 R value when more than 80 Wiener processes are used, we limit the number of Wiener Processes to 100 in further investigations.
We use the same SDE parameters except γ = 0.07 to create training and test data sets, and 100 Wiener processes are used to produce the training data sets. All the R 2 values are obtained by using early stopping. The results in Table 2.1 suggest that when there is only one hidden layer and the number of neurons in the hidden layer is very small, the performance of the network is poor because the network does not have enough "power" to learn the input-output relationship. When the number of neurons in the hidden layer is close to the half number of input neurons, the performance reaches a higher accuracy. Further increase in the number of hidden layers and neurons does not improve the performance.
The ANN performance is investigated for different combinations of drift and diffusion terms. We use three different MLP architectures, 50-30-1, 50-15-15-1, and 50-10-10-10-1, to train and test the data sets, and record the best performance.
The ANN performance is investigated for different combinations of drift and diffusion terms. We use three different MLP architectures, 50-30-1, 50-15-15-1, and 50-10-10-10-1, to train and test the data sets, and record the best performance.   Figure 2.10B shows that the target and network output in the test sets are in good agreement when  = 0.01. Because the test set is created by 5 Wiener processes, it should be noticed that there are five repetitive sub-data sets and each of them represents a range of α values, which is from 1 to 2, with one pattern of Wiener process. By observing the sub-data sets separately, we can gain a better understanding on how noise influences the estimation of the parameter. As the  value reaches 0.05 (Figure 2.10C) and the ratio of diffusion term and drift term reaches 0.67 (shown in Figure 2.10A), the 2 nd , 3 rd and 5 th sub-data sets show more accurate predictions than the 1 st and 4 th sets. Figure 2.10D demonstrates that the network-generated outputs just tend to use the average of targets in most of the sub-data sets when γ = 0.10 where the weight of diffusion term is more than that of the drift term (P γ = 1.39 as shown in Figure 2.10A).
www.intechopen.com  R values for  are very close to zero while the 2 R values for β are all more than 0.9. According to the statistical meaning of 2 R given previously, we consider that the neural networks fail to predict α and succeed in predicting β. We explore the reason for the difference between α and β later.  We use three network architectures, 50-30-2, 50-60-2 and 50-50-50-2, to estimate parameters for different SDEs and recorded the best results. The results in Table 2.3 indicate that the accuracy of network performance decreases as the strength of diffusion terms in SDEs increases, which is similar to the linear equation. Figure 2.11 shows that comparing with the results in the linear case ( Figure 2.10A), the prediction capability of networks for the nonlinear case is poorer due to the complexity of input-output relationship in the nonlinear SDEs.    Table 2.4 shows that the bigger the contribution of a term containing a particular parameter (P α or P β ), the smaller the error (AAPE) and better the prediction (R 2 ) for that parameter. Therefore, we conclude that the accuracy of a parameter in a nonlinear SDE is dependent on its term that contributes pro rata to the drift term.

Range
In the data preparation stage, we use different time steps to solve SDEs and found 50 data points are sufficient to represent the realisation of SDEs. In addition, we emphasise the effect of the number of Wiener processes used to create training data sets. Increasing the number of Wiener processes boosts the performance of networks considerably and eliminates the over fitting problem. When over fitting occurs, the resulting network is accurate on the training set but perform poorly on the test set. When the number of Wiener processes used to generate training data sets is increased, the learning procedure finds common features amongst the training sets that enable the network to correctly estimate the parameter(s) in test data sets.
In the ANN training procedure, we use early stopping to obtain the optimum test results. We also employ different MLP architectures, transfer functions, learning rates and momentums. However we find that these factors do not increase the performance of ANNs significantly.
The diffusion level in a SDE has a significant impact on the network performance. In the linear SDE, when the ratio of diffusion term and drift term is below 0.40, the network can estimate the parameter accurately ( 2 R >0.93). When the ratio reaches 0.67, the network estimates the parameter accurately only when Wiener processes in test sets and in training sets are similar. If the diffusion term is larger than the drift term, the network cannot predict the parameter(s) and only tends to give an average value of the parameters used for training datasets. For nonlinear SDEs, the estimation ability of a network is generally poorer than that for the linear SDEs. Furthermore, the accuracy of a parameter in a nonlinear SDE is dependent on its term that contributes pro rata to the drift term.
We can conclude that the classical neural networks method (MLP with backpropagation algorithm) provides a simple but robust parameter estimation approach for the SDEs that are under certain noisy conditions, but this estimation capability is limited for the SDEs having a high diffusion level. When the diffusion level is high (>10%-20%), the statistical methods also fail to estimate parameters accurately. This research monograph presents a mathematical approach based on stochastic calculus which tackles the "cutting edge" in porous media science and engineering -prediction of dispersivity from covariance of hydraulic conductivity (velocity). The problem is of extreme importance for tracer analysis, for enhanced recovery by injection of miscible gases, etc. This book explains a generalised mathematical model and effective numerical methods that may highly impact the stochastic porous media hydrodynamics. The book starts with a general overview of the problem of scale dependence of the dispersion coefficient in porous media. Then a review of pertinent topics of stochastic calculus that would be useful in the modeling in the subsequent chapters is succinctly presented. The development of a generalised stochastic solute transport model for any given velocity covariance without resorting to Fickian assumptions from laboratory scale to field scale is discussed in detail. The mathematical approaches presented here may be useful for many other problems related to chemical dispersion in porous media.