Elements of Sequential Detection with Applications to Sensor Networks

In this contribution the basic elements of sequential detection theory are presented and some applications to sensor networks are then addressed. Sequential detection was basically introduced by A. Wald in 1947 and deals with hypothesis testing problems assuming that the number of observations made available to the detector is virtually unbounded, hence not fixed in advance as it is the case of the more classical and well-known detection paradigms. From a conceptual view point, a sequential decision procedure is stopped when the specific realization of the data available to the detector is sufficiently informative to make a decision that satisfies prescribed error probability bounds. To compare with the Neyman-Pearson paradigm, note that for this latter the false alarm probability is prescribed, and the goal is to achieve the best detection probability, given a fixed number of samples available. In Wald’s hypothesis test, instead, we are given both the false alarm and detection probabilities, and the attempt is to make a decision, compatible with those performance levels, using the minimum number of samples. Sequential tests outperform the classical decision procedures. This notwithstanding, they are less known to non specialists, and perhaps less used in practical implementations. In part, this is due to the fact that sequential tests are less easy to be analyzed when the data are non independent and/or non identically distributed, although recent advances in that direction are available. However, and this is relevant for the present work, there exist many applicative scenarios where the sequential paradigm naturally arises as a suitable framework. This is the case, for instance, of certain sensor network architectures where a mobile agent sequentially queries the nodes of the network, in order to retrieve data or local inferences stored at those sensors. Here the mobile nature of the agent implies that the data are made available to it in a sequential fashion. In addition, the typical dimensions of certain sensor networks make the assumption of an unbounded stream of data available for sampling, a reasonable mathematical model of the physical scenario. This work is made of two parts. The first presents the theoretical background and tools of sequential detection, while the second addresses some practical applications. In presenting the basic elements of sequential analysis we are neither exhaustive nor mathematically advanced, beyond the typical tools of an electrical engineer. Rather, we hope to collect the main results for easy reference, and basic understanding of the applicative case studies. In discussing the applications to the sensor network, again, we have no pretence at all of O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg

exhaustiveness. Rather, we focus on some recent applications of sequential sampling to sensor networks, as investigated by our research group.

Basic elements of sequential detection 2.1 Martingales
The modern theory of martingales is due to Doob [2] that still remains the basic reference. Below, we only present the basic concepts and results. A random process {X n } is a martingale if, for n = 1, 2, . . ., we have (1) The term martingale was introduced in France to denote a gambling scheme in which the gambler doubles his bet at each step of the play, until he finally wins. More in general, a martingale is intended as a betting scheme designed to improve one's fortune. In effect, if we interpret X n as the fortune of the gambler at step n, the above definition of martingale random process states that the gambler's fortune on the next play is, on the average, the same of his current fortune, irrespectively of the previous history; therefore, the martingale models a fair game [3][4][5]. The perhaps simplest examples of martingales are the sum X n = Σ W i of independent random variables W i 's satisfying the first of (1) and having zero mean, and the product X n = Π W i of independent random variables W i 's with unit mean (again, if the first of (1) is satisfied). A slightly general definition of martingale is as follows. The random process {X n } is a martingale with respect to the random process {Y n } if, for n = 1, 2, . . ., With the above definitions, the so-called Doob martingale can be introduced by considering an arbitrary random process {Y n } , and a random variable X with E[│X│] < ∞. In fact, the process (3) is easily shown to be a martingale with respect to {Y n } . Doob's martingale has important applications in various fields, including estimation theory since the sequence of optimal (in the mean square sense) estimators of a random variable X, given observations Y 1 , . . . Y n , is the conditional mean E[X│Y 1 , . . . , Y n ] which, therefore, is noting but a Doob martingale. In the sequential sampling framework that we are interested in, the concept of stopping time is key. A random variable N taking values in {1, 2, . . . ,∞} is a random time for the process {X n } , if the event {N = n} is determined by X 1 ,X 2 , . . . ,X n . This means that we can decide if N = n or N ≠ n, by only observing the process X i up to time n, while the samples X n+1 ,X n+2 , ... are irrelevant for that. A random time N is called stopping time for the process {X n } if Pr{N < ∞} = 1. Therefore, N is a stopping time for {X n } if the event {N = n} conditioned on knowing the past of the process {X n } , does not depend upon what {X n } does in the future. Indeed, a more general formal definition of stopping time can be given just in terms conditional independence, which is relevant in cases where the sequence of random variables {X n } is only one actor of a more general probabilistic experiment [6].

www.intechopen.com
Consider now a stopping time for the process {X n } , and define the associated stopped process [5]: for n = 1, 2, . . ., (4) It can be shown that, if {X n } is a martingale, then the associated stopped process { X n } is a martingale too. By definition of martingale, taking the expectation to both the sides of the second equation in (1), we have E[X n+1 ] = E[X n ], ∀n ≥1, and consequently, for all n. Since { X n } is a martingale too, we also have E[ X n ] = E[ X 1 ] = E[X 1 ] (note that always X 1 = X 1 ). Since the stopping time is finite with probability one, the original process {X n } will be eventually stopped, that is to say, there must exists a (sufficiently large) value of n such that X n = X N . From that value of n on, the stopped process remains constant, implying that lim n X n = X N . Taking the expectation If we could exchange the limit with the statistical expectation operator, we would get the following important result (5) Indeed, under appropriate technical conditions the above exchange is legitimate and this result is known as the martingale optional stopping theorem, which states the following (see e.g., [4,5]): Equation (5) holds true, provided that at least one of the following conditions is met • the random variables { X n } are uniformly bounded; • the stopping time N is bounded, i.e., Pr{N ≤k} = 1, for some k ≥ 1; • E[N] < ∞ and E[│X n+1 -X n │ │X 1 , . . . ,X n ] < k < ∞, for some k ≥ 1.

Sequential probability ratio test
With the concepts introduced above, we can now elaborate on the likelihood ratio to derive the basic design formulas for the sequential test proposed by Wald. Let us consider the binary test between two simple hypotheses: where f 0,1 (y) are two known probability density functions of Y i , under hypothesis 0, 1, and where the data Y i , i = 1, 2, . . . are iid (independent, identically distributed), for simplicity. As distinct feature, the number of such observations is not determined in advance but it can be virtually unbounded. Wald's test, also known as SPRT (Sequential Probability Ratio Test) prescribes to proceed as follows. Let Λ n be the likelihood ratio pertaining to the above statistical test, using the first n samples Y 1 , . . . , Y n , available, that is: As it is clear from this formulation, the actual number of samples processed in order to make a decision is not fixed and is instead a random quantity, whose actual value will depend on the specific realization of the observation process {Y n } . The detection and false alarm probabilities of the test are defined as usual: and there is an amazing simple relationship relating the pair (P d , P f ) to the thresholds of the test ( 0 , 1 ), as we shall promptly see. Let us assume that hypothesis 0 is actually in force. Given the observations Y 1 , . . . , Y n , let us consider the random process build upon the likelihood ratio (10) Under mild regularity conditions (i.e., assuming E[│Λ n │ │ 0 ] < ∞, ∀n), the process {Λ n } is a martingale with respect to the observation process {Y n } . Indeed, we have (11) Let us define the random time (with 0 < 0 and 1 > 0) (12) and assume that Pr{N < ∞} = 1, i.e., that N is actually a stopping time for the process {Y n } . The associated stopped process can be defined as in (4), and we can invoke (under suitable regularity conditions, see above) the martingale optional stopping theorem, yielding (13) Neglecting the excess over the boundaries (this is also known as Wald's approximation), yields www.intechopen.com (14) (15) that underly many of the approximate design formulas for sequential detectors. We therefore get (16) Reasoning in the same way under hypothesis 1 , but using as martingale the the inverse of the likelihood ratio 1/Λ n , we get (17) Putting together eqs. (16) and (17), immediately yields 1 (18) We reiterate that the approximation involved follows from having neglected the excess over the boundaries 2 . Equations (18) and (19) relate the error probability of the SPRT to the thresholds and are therefore used to set the thresholds of the test, given prescribed performance level. Intriguingly, note that we can set the thresholds without knowing the statistics of the observations; clearly, the likelihood ratio does depend on these statistics. Once that the desired error probabilities have been fixed, and that the thresholds are accordingly set, the main performance figure of the Wald's test is the average sample number (ASN) E[N]. This is the expected number of samples needed to make a decision, and is also referred to as the (averaged) decision delay. To characterize E[N] let us start from what is known in the literature as Wald's identity. Let W i , i = 1, 2, . . . be a sequence of iid random variables, and let X n = Σ W i their cumulative sum. Also, let (20) Introducing the semi-invariant moment generating function of the random variableW i , that is , and assuming that (r) is finite in an open interval Ω around the origin r = 0, we have that, for r ∈ Ω, 1 Logarithms are to natural base. 2 The approximation involved in the above relationships is often accurate for practical purposes, but one can also resort to certain bounds. Specifically, let P da and P fa the actual detection and false alarm probabilities that one gets by setting the thresholds according to eqs. (18) and (19) in which the nominal values P d and P f of those probability are used. Then, see e.g. [7]: www.intechopen.com (21) Differentiating both sides of Wald's identity (21) and evaluating the result at r = 0, the socalled Wald's equality is obtained: (22) (Clearly, the above equality can be also proved more directly, without resorting to Wald's identity, see, e.g., [5].) On the other hand, the second derivative evaluated at r = 0 gives (23) which is useful in sequential analysis whenever E[W] = 0, in which case Wald's equality is of limited use. Suppose now that the true hypothesis is 0 , and let us replace the generic X n with L n , the log-likelihood ratio of hypothesis test (6): (24) With this assumption, it should be clear that the stopping time N in eq. (20) is exactly that defined in (12), and we are in fact faced with the earlier discussed sequential test. Clearly, L n is the cumulative sum of the sequence of iid random variables log [f 1 (y i )/f 0 (y i )], whence application of Wald's equality yields (25) The numerator can be expanded by conditioning, and then approximated by Wald's approximations: where the last approximation follows from eqs. (18) and (19). Defining the binary divergence (measured in nats) D b (α││ ) between two probability mass functions (pmf) [α, 1 -α] and [ , 1 -] as [8] the numerator of (25) can be expressed (within the stated approximation) as follows www.intechopen.com The divergence between two arbitrary probability density functions (pfds) f 0 (y) and f 1 (y) is defined as [8] implying that the denominator of (25) is -D(f 0 ││f 1 ). We finally get Elaborating exactly in the same way, assuming 1 true, we can compute E[N│ 0 ]. Therefore, the final result is Roughly speaking, the above approximations become tighter and tighter for small error probabilities P f , 1 -P d << 1, namely, when the no decision region between the thresholds is large: 0 << 1 and 1 >> 1. Otherwise stated, the approximations are fair when the average number of samples collected is large enough. It is worth noting how Wald popularized these formulas in 1947 in a form that did not involve the divergences, and in fact the divergence was defined by Kullback [9] in the context of information theory, a theory born just one year later, in 1948, with the work by Shannon.

Sequential detection with general test statistics
The fundamental results presented in the previous section trace the route for implementing the SPRT, as well as for computing simple approximations for performance evaluation. Of course, since the pioneering work by Wald, these results have been extended in many different directions, including, among many other, the case of dependent and nonidentically distributed observations [10], sequential tests with arbitrary detection statistics [6], asymptotic results for vanishing signal-to-noise ratio [11], refined approximations for the excesses over boundaries [12], and so on. An exhaustive review of these concepts is clearly beyond the scope of the present work. In this section, we limit ourselves to consider a setting which slightly extends the classical SPRT framework. The mathematical results, which are perhaps less intuitive than the classicalWald's formulas, turn out to be useful from an engineering perspective, as we next show in the sections devoted to sensor network applications.
, with Y i iid random variables, and with t(· ) being a certain transformation, in general different from the log-likelihood ratio. We consider the case that a sequential test is implemented, based upon the above T n . More specifically, let N be the smallest n for which either T n ≥ 1 or T n ≤ 0 . For concreteness, we assume that the t(Y i ) has positive expectation under 1 and negative expectation under 0 , and that the two thresholds accordingly obey 0 < 0 < 1 . This problem can be cast in the more general framework of random walks with two thresholds [6,13,14].
In order to assess the test performances, we need setting the thresholds in order to meet the prescribed false alarm and detection probabilities. To fix ideas, let us consider the case that E[t(Y i )] < 0, that is, for our purposes, we are under 0 . Assuming that the semi-invariant moment generating function (r) of the random variable t(Y i ) is finite in an open interval Ω around the origin, and that has a root r* > 0, we can use Wald's identity with r = r*, that is, , which is equivalent to Note that, if T n is the log-likelihood ratio, r*= 1, and eq. (29) essentially translates into eq. (13), which, incidentally, was obtained by resorting to the theory of martingales. Now, different bounds and approximations can be derived from (29). One option is to neglect the overshoots, as previously done for the log-likelihood ratio, obtaining which is the counterpart of eq. (16). The above technique gives also a direct way to derive upper bounds on the threshold crossing probabilities. Indeed, conditional on and being all terms on the right-hand-side of eq. (29) non-negative, we easily have (31) A positive threshold crossing can be considered, under the assumption that E[t(Y i )] < 0, as an event becoming rare as 1 grows, and this is consistent with the obtained exponential bound. All the reasoning can be applied to the random walk with E[t(Y i )] > 0 (that is, under 1 ), obtaining the two formulas: (32) and where the non-zero root of (r) is now negative, ans is denoted by r**. We note explicitly that these exponential bounds do not work for the case that the random walk is zero-mean. Before concluding this section, we would like to report another useful tool, which extends the previous results to the characterization of the joint distribution of the stopping time N and the barriers. Assuming again that E[t(Y i )] < 0 and (r*) = 0 for some r* > 0, the following two bounds can be derived [6]: and www.intechopen.com In some sense, the above bounds furnish an interpretation of the value n* = 1 / '(r*) as the typical value of the stopping time N conditional on the upper threshold crossing. This concludes our survey of the basic tools and results of sequential detection. In the following, we apply and extend these results by addressing several case studies.

Selected applications
We now present some applications of the sequential sampling theory, whose basic elements have been summarized in the previous sections; the applications are selected from the authors' recent works on the subject, which are somehow related to sensor networks. In the first example (Sect. 3.1) the decentralized architecture of the sensor network is key, and we take a genuine cross-layer perspective of the whole system that merges the detection layer with the (many-to-one) communication layer. The last two examples (Sects. 3.3 and 3.2) focus on the signal processing at the sensor level designed for improving the detection performances of the fusion center, and are therefore exploitable even in certain non decentralized systems. Whenever appropriate, for easy reference, we try to maintain the notation as close as possible to that of the original works to which we refer for more general discussion, in-depth description, and for many technicalities which are deliberately neglected in this presentation.

SENMA detection with censoring nodes [15]
Suppose that a WSN designed for solving a binary hypothesis test is made of many tiny remote units uniformly deployed over the surveyed area, and of a Mobile Agent (MA) having the role of fusion center. The remote units sense the environment and collect data relevant to the detection task, while the MA travels across the network domain and sequentially polls the sensors. Indeed, in the SENMA (SEnsor Network with Mobile Agents) architecture proposed in [16], see also [17][18][19], at each successive MA's snapshot the nodes falling within its field of view are queried for delivering their data. Oppositely to the intrinsic nature of the remote units, the MA can be a very reliable device with large power capabilities and adequate communication/computational properties. In addition, its mobile nature greatly simplifies the sensors/MA communication tasks, thus making the SENMA architecture particularly suited for many practical applications being scalable, robust and simple to implement. In addition, as one might expect, the more important advantage of the SENMA over alternative network structures (e.g., ad-hoc system) is in terms of energy saving for communications, a key parameter for sensor networks. The MA collects the data delivered from the sensors and, as soon as a new observation is made available to it, this is included in the computed detection statistic. We assume that such a statistic is the cumulative sum of the log-likelihood ratios, resulting in the Wald's SPRT [1], discussed previously in this work. The specific viewpoint taken in [15] is that the remote units do not necessarily deliver their data to the MA when they fall in its field of view. In order to further economize the energy burden, a censoring protocol is implemented [20][21][22][23]. Data are delivered only if they are sufficiently informative: the sensor transmission is inhibited if the the locally computed loglikelihood of the measured data does not exceed (in modulus) a certain threshold level. In this way a communication session is activated, and the correspondent energy is spent by the sensor, only if the local observation is expected to contribute in a meaningful way to the final decision. Otherwise, such data will be never received by the MA and do not play any role in building the final statistics. A trade off clearly emerges between detection performances and energy consumption.

Network performances
Let us suppose that data collected by the remote units of the network are M-vectors of iid (independent and identically distributed) observations, and that different nodes observe iid data, as well. If we label with an index n = 1, 2, . . . , the (virtually, infinitely many) remote units, the basic hypothesis test under study is as follows (33) where 1 represents a vector of all 1s. The vectors w n = [w n1 ,w n2 , . . . ,w nM ] have iid components picked from a continuous random variable, whose probability density function (pdf) is φ(w), and is here assumed to be an even function with domain the whole real axis. The known parameter μ rules the amount of shift in mean that distinguishes the two alternative hypotheses. Denoting by x nm the m th observation taken at the n th sensor, the local log-likelihood is and, in absence of censoring, the SPRT would be (see eq. where F i (y) is the CDF of the log-likelihood L(x n ), under hypothesis i = 0, 1. Let N t be the random number of sensors that actually deliver data to the MA, as opposed to N v , the number nodes encountered by the MA in its travel across the surveyed area. Denoting by I(· ) the indicator function, we have where N v is a valid stopping time [6], so that Wald's equality (22)  It makes sense to adopt N t as a proxy of the energy consumption, and N v as a proxy of detection performance in terms of detection delay for achieving a desired level of error probabilities. Therefore, the above relationship emphasize the trade off between detection performances (more data yield better performances) and energy saving, tuned by the censoring level .
We need now to introduce the following quantity: Denoting by P d and P f the desired detection and false alarm probabilities set at the design stage, see eqs. (8) and (9), in [15] the framework of sequential analysis earlier discussed is exploited, to show that (via proper modification of the the techniques leading to eqs. (27) and (28)): and It can be also shown that s( ) is monotonically increasing in , while the product p t ( )s( ) monotonically decreases with . This implies that the larger is , the more energy the network saves but the larger is the detection delay.

Optimization
To compare our censored system with respect to the absence of censoring, let us define E [N] as the average number of sensors resulting from assuming = 0 (no censoring). The following quantities can be introduced  An approximate analysis providing amenable formulas for system performances can be made by using the Central Limit Theorem, see [15], yielding: where Q(· ) is the standard Gaussian complementary CDF. By drawing η t and η v as functions of for different values of ρ, we get the curves for the system optimization, as depicted in Fig. 1, from which the desired system operative point in terms of detection delay and sensors energy consumption can be decided. Figure 2 provides the same information and insight of Fig. 1. An interesting behavior is also observed when the sensors, provided that their observations are informative enough, can only send to the MA the hard decisions (i.e., a binary value) taken at a local level. In this case, different from the previous case that the censored loglikelihood are transmitted, it is possible to prove that an optimal censoring level exists, minimizing the detection delay. Examples of applications, as well as detailed discussions of the above aspects, are addressed in [15], to which the reader is referred for details.

Pre-processing at sensor level for detection after transmission over noisy channels
Suppose that the sensors of a network are connected by dedicate channels (parallel architecture) to a fusion center, i.e., some unit devoted to the task of data fusion, and assume also that such channels are noisy. The issue is to understand if some processing of the data measured at the sensor would increase the detection capabilities of the fusion center. Specifically, we are faced with a detection problem in which remotely observed data are delivered to a fusion center through a certain channel. The fusion center is designed to decide between two mutually exclusive statistical hypotheses, basing its decision upon the received data whose statistical distribution is determined by the underlying hypothesis. Should the observations made at the remote sensor be somehow processed before delivering them over the channel? We assume that the fusion center implements a sequential test. Motivated by eqs. (27) and (28), it makes sense to choose as a measure of the detection performances the divergence between the distributions under the two hypotheses since this directly impacts the average sample number, i.e., the detection delay. Therefore, the above question can be rephrased in terms of divergence: can we increase the divergence at the output of a noisy channel, by elaborating on its input? It is obvious that, if the channel were ideal (noiseless) the answer is certainly negative in view of the data processing inequality. On the other hand, for noisy channels, the answer is in some case affirmative. Let us limit the following discussion to the case where the noisy channel has binary input and output alphabets, and let us model the sought sensor processing as a further channel having as input the original measured data and whose output are the transformed data, to be sent over the physical noisy channel. Formally, we consider the statistical test where p and q are two arbitrary pmfs (column vectors) with alphabet {1, 2}, that rule the random variable I modeling the (iid) sensor observations. The sensor delivers data to the www.intechopen.com fusion center by a discrete memoryless channel, J → K, whose input may be different from I. Indeed a possible sensor processing may take place, which is also modeled as a channel I → J. The physical channel is (46) where C kj = Pr{K = k│J = j}, k, j = 1, 2, while the processing channel is (47) where H ji = Pr{J = j│I = i}, j, i = 1, 2. A convenient, self-explaining, notation is as follows (48) where C is given, while the task is to find a matrix H that maximizes the detection performance at the remote site that observes K. Let x and y be the pmfs of J under the two hypotheses, and, similarly, let w and z be the pmfs of K. We have: The described problem can be cast in the form of an optimization (50) and the following claim can be proved by elementary convex analysis tools: only the following four matrices are candidates for solving the posed optimization problem Clearly, the last two matrices should be ignored since they both lead to zero divergence in terms of the variable K. Given that the first matrix is the identity, the only possibility for improving the detection performance based on K is to try with the upper-right matrix, that is to say, to try with a symbol flipping: if I = 1 is observed, then J = 2 is presented at the input of the physical channel C, and vice versa.
To elaborate, let us assume that the original detection problem is "difficult", in the sense that the hypotheses p and q are very close each other. For instance: where │ │ is small enough. Expanding in series around = 0, we can find www.intechopen.com If we want that we must have that The conclusion is that the sensor pre-processing improves the final detection performances if and only if We are currently working on extensions of the topic here briefly described, with a more general formulation, including non binary observations and channels. For the time being, it is important to emphasize that for binary and symmetric physical channels (i.e., C 11 = C 22 ) there is no way of improving the performances. We have evidences, however, that for non binary cases the problem exhibits much more structure and provides more useful insights from a practical perspective. What remains true in more general settings, however, is that the optimal pre-processing is deterministic, in the sense that given the input I the output J can be determined with probability one, a circumstance with a precise physical meaning.

Noise enhanced sequential detectors [24]
Let us consider a fully decentralized sensor network without fusion center, designed for an inference task. In this typical architecture, each node senses the environment and collects data about a phenomenon to be monitored, think for instance of a binary detection problem where the challenge is to decide which of two possible statistical distributions actually rules the observations. The lack of fusion center is remediated by suitable inter-node communication protocols that allow the system to exchange data up to make the final decision. These data, due to often unavoidable physical constraints, are here assumed to be some nonlinear transformation t(· ) of the original observations. Specifically, the i th node of the network computes t(X i ), where X i is the sensed sample, and delivers such a value to one of its neighbors, say node j. This latter computes t(X i ) + t(X j ) and delivers that to node k, and so on. The decision process is sequential: as soon as the value computed at some node exceeds given thresholds the decision is taken, and the task is terminated.
With this model in mind, motivated by recent advances in noise enhanced and stochastic resonance detection, in [24] the question is posed if adding a "noise" sample, say W i , to the measurement made at node i before computing the nonlinearity t(· ), could provide any benefit in terms of the final detection performances. This at first glance counterintuitive question may have (surprisingly?) a positive answer: there exist cases in which adding noise is beneficial!

Problem formalization
Let us consider first the original shift-in-mean binary hypothesis test: Here i = 1, 2, . . . ,∞, represents the sensor number, X i is the observation made at node i (observations are iid), and the pdf f X (x) is an even function. According to formulation (7), the SPRT for this problem would be (53) but, in many cases of interest, implementing such an optimal SPRT is unfeasible [25]. Therefore, we consider sub-optimal sequential detectors and, as explained before, we also contaminate the original observations with iid noise: this latter effect amounts to consider the noise contaminated observables Y i = X i + W i , in place of the original X i . The noise density is also assumed even-symmetric: f W (w) = f W (-w), and that of the contaminated samples becomes the convolution of the two: The said sub-optimality of the detector amounts to work with a decision statistic in the form: where t(y) is a bounded and non-decreasing odd function. To simplify the analysis, let the error probabilities 1 -P d and P f of the sequential test, see eqs. (8) and (9), be equal, and denote by P e such value. We have where the thresholds and -are symmetric as consequence of the assumed problem symmetries. The above test is not a standard SPRT, since T n is not the log-likelihood. Therefore, the system performances can be obtained as discussed in Sect. 2.3. First, Wald's equality (25), under hypothesis 1 , yields, in the regime of small P e , Furthermore, the exponential bound (31) is used: where r* solves www.intechopen.com The two above functions allow us to write compact formulas for the system analysis. Indeed, in [24] it is found that (60) The optimal performance-enhancing noise density f W (w) must minimize the expected sample number, without increasing the error probability. Using our approximations and bounds, this amounts to (61) where corresponds to the noise-free case.

Example: sign detector
Assume now that the nonlinearity is the sign function, which amounts to one-bit quantized observations The performance functions h 1 (w) and h 2 (w, r) can be found to be being the value of r*corresponding to the absence of injected noise. However this can be shown to be equivalent to its unconstrained counterpart, which amounts to simply find the maximum achievable value of p. Remarkably, it is also possible to prove that the optimizing noise density can be chosen in the class of the coin flipping distributions: We report now the evidences of some numerical simulations aimed at checking the goodness of the found formulas, as well as the potential benefits of adding noise to the observations for detection purposes. We choose for the observation density a mixture of Gaussians: Moreover, we assume that the detection threshold is fixed in such a way to yield, in absence of noise, an error probability P e0 ≈10 -2 . In Fig. 3, top plot, the ASN is displayed as function of different injected noise depth w 0 . The theoretical formulas reasonably match the simulation points, and an optimal value of w 0 minimizing the sample number is clearly present. To get the complete perspective, the actual error probability P e is also displayed in the bottom plot of the same figure. It can be seen that, for any w 0 , while the ASN decreases, the error probability is kept below the value P e0 . Remarkably, in correspondence of the optimal w 0 , the actual P e is in effect orders of magnitude smaller than the design value 10 -2 .

Conclusions
In many instances of Wireless Sensor Networks (WSNs) designed for detection purposes, the fusion center is a mobile device that sequentially queries the nodes of the network. Such a sequential architecture, known as the SENMA structure, fits well the sequential detection paradigms, i.e., the SPRT and its variants. Aside from the SENMA scenarios, the typical tools of sequential detection are exploited much more in general, in various guises, in a variety of WSN applications. This paper provides a succinct introduction to sequential analysis and presents several examples of applications to detection problems. The main aim of this paper is to introduce the reader to the very powerful tool of sequential analysis, providing the basic insights and useful entry points to some topical literature. The specific issues presented in the first part of this work are selected in order to provide the necessary theoretical background for the applicative examples discussed in the second part of the paper. These latter examples, on the other hand, reflect the authors' recent research on the subject. As a consequence, neither the theory nor the applications are exhaustive or complete in any sense. However, the paper is rather self-consistent and can help in gain a first understanding of the topic.