| advertise add site services publishers database health videos | ![]() | about toolbar stats live show health store more stuff JOIN/LOGIN |
For Jensen's inequality for analytic functions, see Jensen's formula. In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906.[1] Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states, that the convex transformation of a mean is less than or equal to the mean after convex transformation. It is a simple corollary that the opposite is true of concave transformations.
[edit] StatementsThe classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using measure theory, or the equivalent probabilist notation. In this probabilistic setting the inequality can be further generalized to its full strength. [edit] Finite formFor a real convex function φ, numbers x1, x2, ..., xn in its domain, and positive weights ai, Jensen's inequality can be stated as: and the inequality is reversed if φ is concave. As a particular case, if the weights ai are all equal to unity, then For instance, the log(x) function is concave (note that we can use Jensen's to prove convexity or concavity, if it holds for two real numbers whose functions are taken), so substituting The variable x may, if required, be a function of another variable (or set of variables) t, so that xi = g(ti). All of this carries directly over to the general continuous case: the weights ai are replaced by a non-negative integrable function f(x), such as a probability distribution, and the summations replaced by integrals. [edit] Measure-theoretic and probabilistic formLet (Ω, A, μ) be a measure space, such that μ(Ω) = 1. If g is a real-valued function that is μ-integrable, and if φ is a convex function on the real axis, then: In real analysis, we may require an estimate on where a,b are real numbers, and The same result can be equivalently stated in a probability theory setting, by a simple change of notation. Let In this probability setting, the measure μ is intended as a probability [edit] General inequality in a probabilistic settingMore generally, let T be a real topological vector space, and X a T-valued integrable random variable. In this general setting, integrable means that there exists an element Here [edit] Proofs A graphical "proof" of Jensen's inequality for the probabilistic case. The dashed curve along the X axis is the hypothetical distribution of X, while the dashed curve along the Y axis is the corresponding distribution of Y values. Note that the convex mapping Y(X) increasingly "stretches" the distribution for increasing values of X. A proof of Jensen's inequality can be provided in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where X is a real number (see figure). Assuming a hypothetical distribution of X values, one can immediately identify the position of the equality taking place when φ(X) is not strictly convex, e.g. when it is a straight line, or when X follows a degenerate distribution (i.e. is a constant). The proofs below formalize this intuitive notion. [edit] Proof 1 (finite form)If λ1 and λ2 are two arbitrary positive real numbers such that λ1 + λ2 = 1 then convexity of This can be easily generalized: if λ1, λ2, ..., λn are positive real numbers such that λ1 + ... + λn = 1, then for any x1, ..., xn. This finite form of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for n = 2. Suppose it is true also for some n, one needs to prove it for n + 1. At least one of the λi is strictly positive, say λ1; therefore by convexity inequality: Since In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as: where μn is a measure given by an arbitrary convex combination of Dirac deltas: Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure. [edit] Proof 2 (measure-theoretic form)Let g be a real-valued μ-integrable function on a measure space Ω, and let φ be a convex function on the real numbers. Define the right-handed derivative of φ at x as Since φ is convex, the quotient of the right-hand side is decreasing when t approaches 0 from the right, and bounded below by any term of the form where t < 0, and therefore, the limit does always exist, and is equal to the infimum of the sequence Now, let us define the following: Then for all x, ax + b ≤ φ(x). To see that, take x > x0, and define t = x − x0 > 0. Then, Therefore, as desired. The case for x < x0 is proven similarly, and clearly ax0 + b = φ(x0). φ(x0) can then be rewritten as But since μ(Ω) = 1, then for every real number k we have In particular, [edit] Proof 3 (general inequality in a probabilistic setting)Let X be an integrable random variable that takes values in a real topological vector space T. Since is decreasing as θ approaches 0+. In particular, the subdifferential of φ evaluated at x in the direction y is well-defined by It is easily seen that the subdifferential is linear in y and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for θ = 1, one gets In particular, for an arbitrary sub-σ-algebra Now, if we take the expectation conditioned to by the linearity of the subdifferential in the y variable, and the following well-known property of the conditional expectation: [edit] Applications and special cases[edit] Form involving a probability density functionSuppose Ω is a measurable subset of the real line and f(x) is a non-negative function such that In probabilistic language, f is a probability density function. Then Jensen's inequality becomes the following statement about convex integrals: If g is any real-valued measurable function and φ is convex over the range of g, then If g(x) = x, then this form of the inequality reduces to a commonly used special case: [edit] Alternative finite formIf Ω is some finite set provided that There is also an infinite discrete form. [edit] Statistical physicsJensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving: where angle brackets denote expected values with respect to some probability distribution in the random variable X. The proof in this case is very simple (cf. Chandler, Sec. 5.5). The desired inequality follows directly, by writing and then applying the inequality to the final exponential. [edit] Information theoryIf p(x) is the true probability distribution for x, and q(x) is another distribution, then applying Jensen's inequality for the random variable Y(x) = q(x)/p(x) and the function φ(y) = −log(y) gives a result called Gibbs' inequality. It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities p rather than any other distribution q. The quantity that is non-negative is called the Kullback–Leibler divergence of q from p. [edit] Rao–Blackwell theoremMain article: Rao–Blackwell theorem If L is a convex function, then from Jensen's inequality we get So if δ(X) is some estimator of an unobserved parameter θ given a vector of observables X; and if T(X) is a sufficient statistic for θ; then an improved estimator, in the sense of having a smaller expected loss L, can be obtained by calculating the expected value of δ with respect to θ, taken over all possible vectors of observations X compatible with the same value of T(X) as that observed. This result is known as the Rao–Blackwell theorem. [edit] See also[edit] Notes
[edit] References
[edit] External links |
| ↑ top of page ↑ | about thumbshots |