The Limits of Econometrics: Nonparametric Estimation in Hilbert Spaces

We extend Bergstrom’s 1985 results on nonparametric (NP) estimation in Hilbert spaces to unbounded sample sets. The motivation is to seek the most general possible framework for econometrics, NP estimation with no a priori assumptions on the functional relations nor on the observed data. ln seeking the boundaries of the possible, however, we run against a sharp dividing line, which defines a necessary and sufficient condition for NP estimation. We identify this condition somewhat surprisingly with a classic statistical assumption on the relative likelihood of bounded and unbounded events (DeGroot, 2004). Other equivalent conditions are found in other fields: decision theory and choice under uncertainty (monotone continuity axiom (Arrow, l970), insensitivity to rare events (Chichilnisky, 2000), and dynamic growth models (dictatorship of the present; Chichilnisky, 1996). When the crucial condition works, NP estimation can be extended to the sample space 12. Otherwise the estimators, which are based on Fourier coefficients, do not converge: the underlying distributions are shown to have "heavy tails" and to contain purely finitely additive measures. Purely finitely additive measures are not constructible, and their existence has been shown to be equivalent to the axiom of choice in mathematics. Statistics and econometrics involving purely Finitely additive measures are still open issues, which suggests the current limits of econometrics.


INTRODUCTION
In 1985, Rex Bergstrom (1985) constructed a nonparametric (NP) estimator for nonlinear models with bounded sample spaces, using techniques of Hilbert spaces, 1 a type of space I introduced in economics (Chichilnisky, 1976(Chichilnisky, , 1977. His article is simple, elegant, and general, but requires an a priori bound on observed data that conflicts with the spirit of NP estimation. This article extends his original results to unbounded sample spaces such as the positive real line R + , 2 using earlier work (Chichilnisky, 1976(Chichilnisky, , 1977(Chichilnisky, , 1996(Chichilnisky, , 1997(Chichilnisky, , 1999(Chichilnisky, , 2000(Chichilnisky, , 2006a(Chichilnisky, , 2006b. We focus on the statistical assumptions needed for the extension of NP estimation from bounded intervals to the positive real line R + , and find a sharp dividing line, a condition that is both necessary and sufficient for extending NP estimation from bounded intervals to the entire real line. The clue to this condition appeared in the literature as a classic statistical assumption that restricts the asymptotic behavior of the unknown function, and derives from a classical assumption on relative likelihoods. In DeGroot (2004) the condition is denoted SP 4 , and it compares the likelihood of bounded and unbounded sets. A simple interpretation for this condition is that, no matter how small is a set B ∅, it is impossible for every infinite interval (n, ∞) to be at least as likely as B. 3 In practical terms, the condition requires that unbounded sets are eventually less likely than any bounded set.
To check the condition in practice, one examines the relative likelihood of any bounded set B and compares it with infinite sets of the form (n, ∞). Eventually, for large enough n, the set B must be more likely than (n, ∞). As a practical example, any continuous integrable density function on the line f : R + → R defines a relative likelihood that satisfies this condition. And as shown below, this condition eliminates "heavy tails." In order to generalize the NP estimation problem as much as possible, we extend Assumption SP 4 , which was originally defined only for density functions, to any continuous unknown function f : R + → R. We show that when SP 4 is satisfied, the unknown function can be represented by a function in a Hilbert space and the NP estimator can be extended appropriately to R + . But when assumption SP 4 fails, the situation is quite different. The estimator does not have appropriate asymptotic behavior at infinity. It appears that a classic statistical assumption holds the cards for extending NP estimation to unbounded sample spaces.
Exploring other areas of the literature, we find other assumptions that we prove to be equivalent to SP 4 and in that sense determinant for extending NP estimation to R + . In decision theory one such assumption is the monotone continuity axiom of Arrow (1970); another is the insensitivity to rare events of Chichilnisky (2000Chichilnisky ( , 2006aChichilnisky ( , 2006b. In optimal growth models it is dictatorship of the present as defined in Chichilnisky (1996). When the key assumptions fail, the estimator does not converge. In all cases, the failure leads to purely finitely additive measures on R + , and to distributions with heavy tails. Econometric results involving purely additive measures are still an open issue, which suggests the current limits of econometrics.
Results extending semi-NP estimation and NP estimation to infinite cases, dealing with separate but related issues, can be found, e.g., in Blundell, Chen, and Kristensen (2006), Stinchcombe (2002), and Chen, Hansen, and Sheinkman (2005), among others. Other articles in the NP literature include Andrews (1991) and Newey (1997), both of which again assume a compact support for the regressor.

HILBERT SPACES
The methodology we use here is weighted Hilbert spaces as defined in Chichilnisky (1976Chichilnisky ( , 1977, two publications that introduced Hilbert spaces in economics. These earlier results suggested the advantages of using Hilbert spaces in econometrics, in particular for NP estimation. 4 The rationale is simple: NP estimation is by nature infinite-dimensional, because when the forms of the functions in the true model are unknown, the most efficient use of the data is to allow the estimated functions (or the number of estimated parameters) to depend on the size of the sample, tending to infinity with the sample size. This provides a natural infinite-dimensional context for NP estimation. In this context, Hilbert spaces are a natural choice, because they are the closest analog to euclidean space in infinite dimensions.
Bergstrom pointed out in personal communication (see Note 1) that there is a natural limitation for the use of Hilbert spaces on the real line R. Standard Hilbert spaces such as L 2 (R) require that the unknown function approaches zero at infinity, a somewhat unreasonable limitation to impose on the economic model, as it excludes widely used functions such as constant, increasing, and cyclical functions on the line. To overcome his objection, I suggested using weighted Hilbert spaces, since these impose weaker limiting requirements at infinity, as shown below. Bergstrom's article (1985) acknowledged my contribution to NP estimation in Hilbert spaces, but it is restricted to bounded sample spaces: his results apply to L 2 spaces of functions defined on a bounded segment of the line, [a, b] ⊂ R.
Below, we extend the original methodology in Bergstrom (1985) to unbounded sample spaces by using weighted Hilbert spaces as originally proposed. In exploring the viability of the proofs, we run into an interesting dilemma. When the sample space is the entire positive real line, Hilbert space techniques still require additional conditions on the asymptotic behavior of the unknown function at infinity. In bounded sample spaces such as [a, b], this problem did not arise, because the unknown are continuous, and therefore bounded, and belong to the Hilbert space L 2 [a, b]. But this is not the case when the sample space is the positive line R + . A continuous real-valued function on R + may not be bounded and may not be in the space L 2 (R + ). 5 Therefore, the Fourier series expansions that are used for defining the estimator may not converge. With unbounded sample spaces, additional statistical assumptions are needed for NP estimation.
Consider the problem of estimating an unknown function f on R + , for example, a capital accumulation path through time or a density function, which are standard nonlinear NP estimation problems. The unknown density function may be continuous, but not a square integrable function on R + , namely, an element of L 2 (R + ). Since the NP estimator is defined by approximating values of the Fourier coefficients of the unknown function (Bergstrom, 1985), when the Fourier coefficients of the estimator do not converge, the estimator itself fails to converge. A similar situation arises in general NP estimation problems where the unknown function may not have the asymptotic behavior needed to ensure the appropriate convergence. This illustrates the difficulties involved in extending NP estimation in Hilbert spaces from bounded to unbounded sample spaces.
The rest of this article focuses on the statistical necessary and sufficient conditions needed for extending the results from bounded intervals to the positive line R. +

STATISTICAL ASSUMPTIONS AND NP ESTIMATION
A brief summary of earlier work follows. Bergstrom's statistical assumptions require that the unknown function f be continuous and bounded a.e. on the sample space [a, b] ∈ R. 6 His sample design assumes separate observations at equidistant points. The number of parameters increases with the size of the sample space, and disturbances are not necessarily normal.
Bergstrom uses an orthogonal series in Hilbert space to derive NP properties and prove convergence theorems. The series is orthonormal in the Hilbert space rather than in the sample space, so the elements remain unchanged as the sample size increases. This series includes polynomials or any dense family of orthonormal functions in the Hilbert space.
An estimatorf is defined in a simple and natural manner (Bergstrom, 1985): the first M Fourier coefficients of the unknown function f are estimated relative to the orthonormal set. Estimates of the coefficients are obtained from the sample by ordinary least squares regression, setting the rest of the Fourier coefficients to zero. Bergstrom (1985) shows that E b a {f M N (x) − f(x)} dx can be made arbitrarily small by a suitable choice of M and N . He also defines an estimator f * N (x) that is optimal for the sample size N , and shows that this converges in a given metric to f(x) as N → ∞. A third theorem shows how an optimal value of M * is related to the Fourier coefficients and the mean square errors of their estimates obtained from regressions with various values of M, and provides the basis for an estimation procedure. A "stopping rule" is also provided for estimating the optimum value of the parameter that, for a given sample, provides the exact number of Fourier coefficients to be estimated. The definition of the estimator and the proofs of these results, require that f be an element of a Hilbert space.
Unbounded sample spaces give rise to a different type of problem. To explain the problem and motivate the results, we first explain why earlier work was restricted to bounded sample spaces.

WHY BOUNDED SAMPLE SPACES?
When working in Hilbert spaces, there are good technical reasons for requiring that the sample space be bounded. For example, consider the typical Hilbert space of functions L 2 , the space of square integrable measurable real-valued functions. Bergstrom (1985) considered the space L 2 [a, b] of functions defined on the bounded segment [a, b] ⊂ R, where [a, b] represents the sample space. As he points out, there is no need to assume anything further than continuity for the unknown function. 7 Every continuous unknown function on the segment [a, b] is bounded and belongs to the Hilbert space L 2 [a, b].
However, when the sample space is unbounded, such as R + , the square integrability condition of being in L 2 (R + ) function imposes significantly more restrictions. For example, for continuous functions of bounded variation, it requires that the functions to be estimated have a well-defined limit at infinity, such as lim t→∞ f (t) = 0. This is not a reasonable restriction to impose on the unknown function if, for example, the function represents capital accumulation, which typically increases over time. The restriction on lim t→∞ f (t) also eliminates other standard cases, such as constant, increasing, or cyclical functions.
In mathematical terms, an appropriate transformation of the line can alleviate the problem. This was the methodology introduced in Chichilnisky (1976Chichilnisky ( , 1977, the first publications to use Hilbert spaces in economics. I defined then a Hilbert space L 2 (R + ) with a "weight function" γ (t) that defines a finite density measure for R + . 8 In this case, square integrability requires far less, only that the product of the function times the weight function, f (t)γ (t), converges to zero at infinity, rather then the function f (t) itself. This is a more reasonable assumption, which is asymptotically satisfied by the solutions in most optimal growth models, where there is a well-defined "discount" factor γ > 1. The solution I considered was the (weighted) Hilbert space L 2 (R + ,γ ) of all measurable functions f , for which the absolute value of the discounted product f (t)e −γt is square integrable (Chichilnisky, 1976(Chichilnisky, , 1977. As already stated, this does not require lim t→∞ f (t) = 0, and it includes bounded, increasing, and cyclical real-valued functions on R + . 9 It is of course possible to include other weight functions as part of the methodology introduced in Chichilnisky (1976Chichilnisky ( , 1977, provided the weight functions are monotonically decreasing and therefore invertible, but the ones specified in Chichilnisky (1976Chichilnisky ( , 1977 are naturally associated with the models at hand. This solution is an improvement, but the condition that the unknown function belongs to a Hilbert space still poses asymptotic restrictions at infinity, which are considered below. In the case of optimal growth models in Chichilnisky (1977), the methodology of weighted Hilbert spaces is based on a transformation map induced by the model itself, its own discount factor γ : Under this transformation, the unbounded sample space R + is mapped into the bounded sample space [0, 1) where the original assumptions and results for bounded sample spaces can be reinterpreted appropriately in a bounded sample space. This is the route followed in this paper.
Before doing so, however, it seems worth discussing briefly a different methodology that has been suggested for NP estimation with unbounded sample spaces, 10 explaining why it may be less suitable.

COMPACTIFYING THE SAMPLE SPACE
A natural approach to extend NP estimation to unbounded sample spaces would be to compactify the sample space and apply the existing results to the compactified space. For example, the compactification of the positive real line R + yields a space that is equivalent to a bounded interval [a, b]. To proceed with NP estimation, one needs to reinterpret every function f : R + → R as a function defined on the compactified space, f : R → R. As we see below, this requires from the onset that the function f on R has a well-defined limiting behavior at infinity, namely, lim t→∞ f (t) < ∞. Otherwise, f cannot be extended to a function on the compactified space. To lift this constraint, Peter Phillips suggested that one could estimate (rather than assume) the behavior of the unknown function at infinity. 11 But in all cases, some limit must be assumed for the unknown function, which can be considered an unrealistic requirement. The following example shows why.
Consider the Alexandroff one-point compactification of the real line R + , which consists of "adding" to the real numbers a point of infinity {∞} and defining the corresponding neighborhoods of infinity. This is a frequently-used technique of compactification. A function f on the line R can be extended to a function on the compactified line, but only if f has a well-defined limiting behavior at infinity, namely, if there exists a well-defined lim t→∞ f (t). This is not always possible nor a reasonable restriction to impose; for example, this requirement excludes all cyclical functions, for which lim t→∞ f (t) does not exist.
One can explore more general forms of compactification, such as the Stone-Cech compactification of the line R, the most general possible compactification of the real line. 12 R is a well-behaved Hausdorff space and is a universal compactifier of R, which means that every other compactification of R is a subset of it. Any function f : R → R can be extended to a function on the compactified space, f : R → R. However, it is difficult to interpret Hilbert spaces of functions defined on R, since these would be square integrable functions defined on ultrafilters rather than on real numbers. Such spaces do not have a natural interpretation.
To overcome these difficulties, in the following we use weighted Hilbert spaces for NP estimation on unbounded samples spaces.

NP ESTIMATION ON WEIGHTED HILBERT SPACES
Following Chichilnisky (1976Chichilnisky ( , 1977 consider the sample space R + = [0, ∞) with a standard σ field and a finite density or "weight function" γ : Define the weighted Hilbert space H γ , also denoted H , consisting of all measurable and square integrable functions g(.) : R + → R with the weighted L 2 norm · : Observe that the space H contains the space of bounded measurable functions L ∞ (R + ) and includes all periodic and constant functions, as well as many increasing functions.
The weight function γ induces a homeomorphism, namely, a bicontinuous one to one and onto transformation, between the positive real line and the interval [1,0) In the following, we use a modified homeomorphism δ : 1) to maintain the standard order of the line. The transformation δ allows us to translate Bergstrom's 1985 methodology, assumptions, and notation, which are valid for [0, 1), to the positive real line R + . The following section interprets the statistical assumptions in Bergstrom (1985) for NP estimation in this new context and introduces new statistical assumptions.

STATISTICAL ASSUMPTIONS AND RESULTS ON R +
We have a sample of N paired observations (t 1 , y 1 ),...,(t N , y N ) in which t 1 ,..., t N are nonrandom positive real numbers whose values are fixed by the statistician and y 1 ,..., y N are random variables whose joint distribution depends on t 1 ,..., t N . In particular, it is assumed that E( 1) is the one to one transformation defined above, and x i = δ(t i ).
We are concerned with estimating an unknown function g : R + → R over the sample space R + or, equivalently, estimating the function f over the bounded The model is precisely described by Assumptions 1 to 4 below, which are a transformed version of the Assumptions in Bergstrom (1985, Sect. 2, p. 11). We also require a new statistical assumption, Assumption 3 below, which is needed due to the unbounded nature of our sample space. 13 Observe that, given the properties of the transformation map δ, it is statistically equivalent to work with the nonrandom variables t 1 ,..., t N or, instead, with the transformed nonrandom variables x 1 = x 1 (t 1 ),..., x N = x N (t N ). To simplify the comparison with Bergstrom's (1985) results, it seems best to use the latter variables when describing the statistical model. Assumption 1 (Sampling assumption). The observable random variables y 1 ,..., y N are assumed to be generated by the equations Here, a, b are constants (1 b > a 0), and u 1 ,..., u N are unobservable random variables 14 satisfying the conditions Assumption 2. The unknown function g : R + → R is continuous or, equivalently, the "transformed" function f : When the domain of a function f -namely the sample space-is the closed bounded interval [0, 1] then, being continuous, f is bounded and f ∈ L 2 [0, 1] as pointed out in Bergstrom (1985, p. 11). One may therefore apply Hilbert spaces techniques for NP estimation.
In our case, the (transformed) function f is defined over the (half open) interval [0, 1). Under appropriate boundary conditions, f can be extended to the closed interval [0, 1]. Continuity over the closed bounded interval implies boundedness, and furthermore, it ensures that f ∈ L 2 [0, 1]. But this is no longer true when the sample space is the positive real line R + , or, equivalently, the transformed sample space is δ(R + ) = [0, 1). A continuous function defined on R + may not be bounded and may not belong to L 2 (R + ). 15 For the unbounded sample space R + , we require the following additional statistical assumption on the unknown function: Assumption 4. The countable set of continuous functions φ 1 (x(t)),..., φ N (x(t)) is a complete orthonormal set in the space L 2 (R + ) of square integrable functions on R + with ordinary Lebesgue measure μ.
This requires that the functions φ j be continuous, linearly independent, dense in L 2 (R + ), and satisfy the conditions Observe that one can consider different orthonormal sets; for example, Bergstrom (1985) considers an orthonormal set consisting of polynomials of increasing order.
On the basis of Assumptions 1, 2, 3, and 4, the following results, which are reproduced from Bergstrom (1985), obtain directly from those of Bergstrom (1985). These results are expressed in the transformed unknown function f : [0, 1] → R to facilitate comparison with Bergstrom (1985) but can be equivalently expressed on the unknown function g : R + → R.
where ∧ c 1 (M, N ),..., are the values of i.e., there are sample regression coefficients. Then for an arbitrarily small real number ε > 0, there is an integer M ε and a function N ε (M) such that THEOREM 2 (Bergstrom, 1985). Let M * be the smallest integer such that where f M N (x) is defined for M = 1,..., N by (1), and let f * N (x) be defined by
Theorems 1, 2, and 3 are quite general, but the underlying assumptions (1 to 4) still require interpretation for the case of unbounded sample spaces. The following section tackles this issue.

STATISTICAL ASSUMPTIONS ON R +
Assumptions 1, 2, and 4 have a ready interpretation in the transformed sample space. Assumption 3 is, however, of a different nature. It requires that the unknown function g : R + → R be an element of a (weighted) Hilbert space or, equivalently, that the transformed unknown function f : [0, 1) → R can be extended to a continuous function in the Hilbert space L 2 [0, 1]. This condition is critical: when Assumption 3 is satisfied, Theorems 1, 2, and 3 extend NP estimation to R + , but otherwise these theorems, which depend on the properties of L 2 functions and the convergence of their Fourier coefficients, no longer work. What conditions are needed to ensure that Assumption 3 holds? 16 The following provides classical statistical conditions involving relative likelihoods, cf. DeGroot (2004, Ch. 6). THEOREM 4. If a relative likelihood satisfies assumptions SP 1 to SP 5 of DeGroot (2004, Ch. 6), then there exists a probability function f : R + → R representing the relative likelihood , where f is an element of the Hilbert space L 2 (R + ) and Assumption 3 above is satisfied.
Proof. Consider the five assumptions SP 1 ,..., SP 5 provided in DeGroot (2004, Ch. 6). Together they imply the existence of a countably additive probability measure on R + that agrees with the relative likelihood order (cf. DeGroot, 2004, Sect. 6.4, pp. 76-77). Given any countably additive measure μ on R, one can always find a functional representation as a measurable function, f : R + → R, that is integrable, f ∈ L 1 (R + ), and satisfying μ( A) = A f (x) dx (Yosida and Hewitt, 1952). In other words, the five assumptions SP 1 ,..., SP 5 guarantee the existence of an absolutely continuous distribution representing the "relative likelihood of events" (DeGroot, 2004).
Since the space of integrable functions on the (positive) real line is contained in the space of square integrable functions on the (positive) real line, L 1 (R + ) ⊂ L 2 (R + ) (Yosida and Hewitt, 1952), it follows, under the assumptions, that f ∈ L 2 (R + ), as we wished to prove. Thus the five statistical assumptions of DeGroot (2004) suffice to guarantee Assumption 3, and hence the results of Theorems 1, 2, and 3.
n Among the five fundamental statistical assumptions of DeGroot (2004), there is one, SP 4 , that plays a key role: it is necessary and sufficient to extend the NP estimation results to unknown density functions on R + . The next step is to define assumption SP 4 and explain its role. The notation A B indicates that the likelihood of the set or event A is higher than the likelihood of B (see DeGroot, 2004).

. be a decreasing sequence of events, and B some fixed event such that A i B for
To clarify the role of SP 4 , suppose that each infinite interval of the form (n, ∞) ⊂ R, n = 1, 2,... is regarded as more likely (by the relative likelihood) than some fixed small subset B of R. Since the intersection of all these intervals is empty, B must be equivalent to the empty set φ. In other words, if B is more likely than the empty set, B φ, then, regardless of how small B is, it is impossible for every infinite interval (n, ∞) to be as likely as B. One way to interpret the role of Assumption SP 4 is in averting heavy tails. DEFINITION 3. We say that a relative likelihood has heavy tails when, for any given set B, there exist an n > 0 and a set C ⊃ (n, ∞), such that C B; namely, C is as likely as B ⊂ R + . 17 Intuitively, this definition states that there exist infinite intervals or "tail sets" of the form (n, ∞) with arbitrarily large measure, which may be interpreted as heavy tails.
THEOREM 5. When assumption SP 4 fails, relative likelihoods have heavy tails.
Proof. The logical negation of SP 4 implies that there exists a large enough n such that (n, ∞) is as likely as B, for any bounded B. This implies that the probability measure of the event (n, ∞) does not go to zero when n goes to infinity. Therefore, one obtains heavy tails as defined above.
n It is possible to interpret SP 4 to apply to any unknown function f : R + → R within the statistical model defined above. For this, one must reinterpret the relationship that appears in the definition of SP 4 as follows.
DEFINITION 4. Let f : R + → R be a continuous positive valued function. Then the expression A B means A f dx < B f dx, where integration is with respect to the standard measure on R + . 18 When working in Hilbert spaces, we use a similar definition of the expression to obtain necessary and sufficient conditions below.
DEFINITION 5. Let f : R + → R be a continuous function. Then the expression A B means A f 2 dx < B f 2 dx, where integration is with respect to the standard measure on R + . 19 The following extends SP 4 to any continuous function f : R + → R.
DEFINITION 6 (Assumption SP 4 in Hilbert spaces). Let A 1 ⊃ A 2 ⊃ ... be a decreasing sequence of sets in R + , and B some fixed set such that A i B, namely, In other words, if B is any set such that B f 2 (x) dx > 0, then, regardless of how small B is, it is impossible for every infinite interval (n, ∞) to satisfy (n,∞) f 2 (x) dx > B f 2 (x) dx. This is a reasonable extension of SP 4 provided above.

SP 4 IS NECESSARY AND SUFFICIENT FOR EXTENDING NP ESTIMATION TO R +
To obtain specific necessary and sufficient conditions for NP estimation, consider now the statistical model defined above, and assume that all the statistical assumptions of Bergstrom (1985) are satisfied, namely, Assumptions 1, 2, and 4. We study the estimation of an unknown function g : R + → R. When the model is restricted to the bounded sample space [0, 1], namely, g : [0, 1] → R, Theorems 1, 2, and 4 of Bergstrom (1985) ensure the existence of an NP estimator in Hilbert spaces with the appropriate asymptotic behavior. The following provides a necessary and sufficient condition for extending the NP estimation results from the sample space [0, 1] to the unbounded sample space R + . THEOREM 6. Assumption SP 4 of DeGroot (2004), as extended above, is necessary and sufficient for extending NP estimation in Hilbert spaces from the sample space [0,1] to the unbounded sample space R + .
n Observe that when SP 4 fails, the distribution induced by the density f is not countably additive and cannot be represented by a function in H, and the estimator, which is constructed from Fourier coefficients, fails to converge.

CONNECTION WITH DECISION THEORY
The general applicability of NP estimation, and the central role played by assumption SP 4 , make it desirable to situate the results in the context of the larger literature. A natural connection that comes to mind is decision theory. There is a logical parallel between the classic assumptions on relative likelihood (DeGroot, 2004) and the classic axioms of decision making under uncertainty (Arrow, 1970). From assumptions on relative likelihood one obtains probability measures that represent the likelihood of events. From the axioms of decision making under uncertainty, one derives subjective probability measures that define expected utility. One would expect to find an axiom in the foundations of choice under uncertainty that corresponds to assumption SP 4 on relative utility. Such an axiom exists: it is called Monotone Continuity in Arrow (1970) and, as shown below, it is equivalent to SP 4 . We use standard definitions for actions and lotteries used in the theory of choice under uncertainty, see, e.g., Arrow (1970) and Chichilnisky (2000).

DEFINITION 8. The expression A B is now used to indicate that action A ⊂ R is preferred to action B ⊂ R.
DEFINITION 9 (Monotone Continuity Axiom; Arrow, 1970 The following results use the identification (see Yosida and Hewitt, 1952;Yosida, 1974;Chichilnisky, 2000) of a distribution on the line R with a continuous linear real-valued function defined on the space of bounded functions on the line, L ∞ (R). 20 THEOREM 7. Assumption SP 4 of DeGroot (2004) is equivalent to the Monotone Continuity Axiom (Arrow, 1970).
Proof. The strategy is to show that SP 4 and the Monotone Continuity Axiom (MCA) are each necessary and sufficient for the existence of a ranking of events in R + (by relative likelihood or by choice, respectively) that is representable by an integrable function on R + . 21 Consider first the Monotone Continuity Axiom. Chichilnisky (2006aChichilnisky ( , 2006b showed it is necessary and sufficient for the existence of a choice function that is a continuous linear function on R, an element of the dual space L * ∞ (R), represented by a countably additive measure on R and thus admitting a representation by an integrable function in L 1 (R). The argument is as follows: the dual space L * ∞ (R) is (by definition) the space of all continuous linear real-valued functions on L ∞ (R + ). It has been shown (Yosida, 1952;Yosida and Hewitt, 1974) that this space consists of both countably additive and purely finitely additive measures on R. Chichilnisky (2006aChichilnisky ( , 2006b showed that the monotone continuity axiom rules out purely finitely additive linear measures and ensures that the choice criterion is represented by a countably additive measure on R (Thm. 2, Chichilnisky, 2006aChichilnisky, , 2006b. Since a countably additive measure on R can always be represented by an integrable function in L 1 (R + ) (Yosida, 1952;Yosida and Hewitt, 1974), this completes the first part of the proof. Consider now SP 4 . DeGroot (2004) showed that Assumption SP 4 eliminates distributions that are purely finitely additive, as shown DeGroot (2004, Sect. 6.2, p. 73, par. 3), ensuring that the distribution is represented by a countably additive measure, which completes the proof. n

RARE EVENTS AND SUSTAINABILITY
When estimating an unknown path f over time, SP 4 can be interpreted as a condition on the behavior of the unknown function on finite and infinite time intervals. A related necessary and sufficient condition has been used in the literature on sustainable development: it is called dictatorship of the present (Chichilnisky, 1996). For any order of continuous bounded paths f : R + → R : DEFINITION 10. We say that is a dictatorship of the present when for any two f and g there exists an N = N ( f, g) such that f g ⇔ f g , for any f and g that are identical to f and g on the interval [0, N ).
The condition of dictatorship of the present (Chichilnisky, 1996) is equivalent to the representation of a welfare criterion by countably additive measures, and by an attendant integrable function on the line. The condition is also logically identical to insensitivity to rare events (Chichilnisky, 2000(Chichilnisky, , 2006a(Chichilnisky, , 2006b when the numbers in the real line R + represent events rather than time periods. DEFINITION 11 (Chichilnisky, 2000(Chichilnisky, , 2006a(Chichilnisky, , 2006b. A ranking of lotteries W : L → R is called insensitive to rare events when for any two lotteries, f and g, there is an ε > 0,ε = ε( f, g) such that W ( f ) > W (g) ⇔ W ( f ) > W (g ) for every f and g that differ from f and g solely on sets of measure smaller than ε.
DEFINITION 12 (Chichilnisky, 2000(Chichilnisky, , 2006a(Chichilnisky, , 2006b. A ranking of lotteries W : L → R is sensitive to rare events when it is not insensitive to rare events. THEOREM 8. Assumption SP 4 is equivalent to Monotone Continuity (Definition 9) and to insensitivity to rare events (Chichilnisky, 2000), and the latter is logically identical to dictatorship of the present. In their appropriate contexts, each of the four conditions (SP 4 , monotone continuity, insensitivity to rare events, and dictatorship of the present) is necessary and sufficient for extending NP estimation results to R + .
Proof. Chichilnisky (2006aChichilnisky ( , 2006b established that insensitivity to rare events is equivalent to the Monotone Continuity Axiom in Arrow (1970;and cf. Thm. 2 in Chichilnisky, 2006). Chichilnisky (1996Chichilnisky ( , 2000 showed that insensitivity to rare events is logically identical to dictatorship of the present. Theorems 6 and 7 complete the proof of Theorem 8. n

CONCLUSIONS
We extended Bergstrom's 1985 results on NP estimation in Hilbert spaces to unbounded sample sets, using previous results in Chichilnisky (1976, 1977, 1996, 2000, 2006a, 2006b). The focus was on the statistical assumptions needed for the extension. When estimating an unknown function on the positive line R + , we obtained a necessary and sufficient condition that derives from a classic assumption on relative likelihoods, SP 4 in DeGroot (2004). We extended assumption SP 4 , and therefore the results, to any unknown continuous function f : R + → R. We also showed that the SP 4 assumption is equivalent to well-known axioms for choice under uncertainty, such as the Monotone Continuity Axiom in Arrow (1970), insensitivity to rare events in Chichilnisky (2000Chichilnisky ( , 2006aChichilnisky ( , 2006b, and to criteria used for sustainable choice over time, such as dictatorship of the present (Chichilnisky, 1996).
When the key assumptions fail, the estimators on bounded sample spaces that are based on Fourier coefficients do not converge. We showed that this involves heavy tails and purely finitely additive measures, thus suggesting a limit to NP econometrics. NOTES 7. Bergstrom (1985) required the function to be continuous a.e. on [a, b], which is essentially the same in our case.
8. The weighted Hilbert space H is defined as the space of all square integrable functions on the positive line using the (finite) density function δ −x : a function f ∈ H when the weighted integral 9. This weaker assumption always works, but has no natural interpretation when the model lacks a discount factor.
12. This can be described as the space of all ultrafilters of the real line R. 13. It is also needed because [0,1) is not compact. 14. To simplify notation, we may assume in the following without loss of generality a = 0, b = 1. 15. Appropriate boundary conditions are needed for this to be, for example, in the case of continuous functions of bounded variation, lim x→∞ f (x) = 0.
16. Since we consider weight functions γ , one could interpret the requirement simply as the fact that g 2 does not go to infinity too fast. But what does "too fast" mean in a nonparametric context? Compared to what? Imposing a limiting condition at infinity or at a boundary (x → 1) becomes a parametric requirement that conflicts with the intention of nonparametric estimation. Alternatively, one could choose another "weighting" function γ on the definition of H γ for which g ∈ H γ , but this becomes an arbitrary parameter and defeats the nonparametric nature of the problem. When estimating a density function f over R + , one may answer the question by reference to the properties of the associated relative likelihood function (DeGroot, 2004, ch. 6), or, when estimating an investment path over time, one may consider the behavior of the associated capital accumulation path. A referee pointed out that an alternative choice of weighting function is the density of the regressor (assuming it has one). In practice, this is not known but could be estimated. If one uses the empirical c.d.f.
(t) = ∑ i I {t i t}/N , then x i = −1 (t i ) = (i − 1)/N . In a strict sense is not invertible, but this can be handled using a kernel-smoothed version of it. Using the density π(t) = (t), the central condition becomes R + g 2 (x)π(t) dt = E{g 2 (t)} < ∞ and highlights that the condition concerns both the regression function and the distribution/transformation of the random variable t.
17. Other definitions of the phenomenon known as heavy tails exist, and are not discussed here. Our interpretation is presented as one possible definition of heavy tails that is justified by the fact that it is based only on the "primitives" of the statistical theory, namely, on "relative likelihoods." 18. Observe that this interpretation of the relationship is identical the definition of relative likelihood when f is a density function. Therefore it agrees with the definition provided in the previous section.
19. Observe that the interpretation of the relationship is identical to the definition of relative likelihood when f is a density function. Therefore it agrees with the definition provided in the previous section.