Talk:Jensen's inequality

Statistics Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Low	This article has been rated as Low-importance on the importance scale.

Daily pageviews of this article

A graph should have been displayed here but graphs are temporarily disabled. Until they are enabled again, visit the interactive graph at pageviews.wmcloud.org

Restrictions on Phi(x)

What restrictions must be placed on φ(x)?

I attempted to derive the triangle inequality for infinite series from Jensen's Inequality. I made this attempt by assuming all a_i=1. And then by allowing φ(x)= |x|, the absolute value function.

The math works for the most part. But Im not sure whether to treat |x| as a linear function or not. Under the conditions listed, as they are listed, and assuming |x| is linear, I arrive at the equality | ∑x | = ∑|x|. While the triangle inequality demands a strict inequality.

Am I doing something mathematically unsound? --67.168.137.181 (talk) 01:53, 1 December 2014 (UTC)[reply]

University logo

Jensen's inequality serves as logo for the mathematics department of

Copenhagen University

.

In the language of measure theory

The statement in the language of measure theory is true iff $\mu$ is a positive measure. That is, if it is indeed a probability measure. So, I do not see the point of keeping the two different statements (language of measure theory and probability theory). They are exactly the same, with two different notations! And there are several other coherent notations used in measure/probability theory that we could then use here (but of course, the Jensen article is not the right place to discuss about general notations). Therefore, I purpose to delete the language of measure theory section, and just leave the theorem stated with the $\mathbb {E}$ notation. I will delete it in a few days if I do not receive comments. gala.martin (what?) 18:39, 30 April 2006 (UTC)[reply]

Use of g

In the measure theoretic notation, the use of g is IMHO misleading. We should replace it by x simply. Indeed, the inequality written with x is not less general than the one with g(x), since the generality of the measure mu allows to recover any function with 0 effort. Adding functions instead of the identity in this case is not generalizing, is confusing. I'll try to be clearer. If you want to write a theorem about a random variable, you say let X be a random variable with property A and B, then X has the property C, this is not less general than let X be a random variable, such that g(X) has the property A and B. Than g(X) has the property C. I think that's exactly what we are writing. Am I right? --gala.martin (what?) 09:36, 29 August 2006 (UTC)[reply]

Different proofs

The graphical proof can be made more clear with a concrete example, say $\phi (x)=e^{x}$ , and a particular distribution, say discretely uniform random variable. The abstract proof number 2 using measure theoretic notation can also be illustrated graphically and that it can tie in perfectly with the intuitive graphical argument.

The first proof by induction does not appear simple in the generalization step with the use of delta function and other notions. The third proof appears to have overly complicated notations. The proof idea is unclear at the end, which a summary or conclusion would help clarify. It would be good to point out the difference compared to the second proof, if any, in addition to the notations.

The second proof is concise yet general. A translation to the probability notation should simply involve rewriting the integral as the expectation, and translating the linearity of integration to linearity of expectation. It would be better if it is put first with the following changes:

use $X$ instead of $g$ for the random variable
point out at the end that any subderivative could have been used in place of the right-handed derivative
tie it in with the graphical proof and a concrete example.

--Chungc 05:55, 4 December 2006 (UTC)[reply]

The following equality in the third proof fails for $\phi (x)=e^{x}$ , as for, say, y=1 the lim is 1 at 0, while the inf is 0, as one can see for $\theta \to -\infty$ :

(D\varphi )(x)\cdot y:=\lim _{\theta \downarrow 0}{\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }}=\inf _{\theta \neq 0}{\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }}.

One should really use a subderivative here. On a related note, does anyone know a way to prove the existence of such on an arbitrary vector space without using Zorn's lemma? --Pavel.zorin (talk) 10:14, 10 November 2009 (UTC)[reply]

Image:Jensen_graph.png

This bot has detected that this page contains an image,

vector graphic format. If this bot is in error, you may leave a bug report at its talk page Thanks SVnaGBot1 (talk) 15:09, 3 July 2009 (UTC)[reply

]

Also I noticed that in the bottom graph, it says Y(E(X)) when it should actually say

\varphi (E(X))

. It would also be useful to add the X=Y line to the image to more easily see that the Y values are larger than their corresponding X values. Any idea how to correct the image? Toby Dylan Hocking, 4 Feb 2010.

I agree with your suggestions. The file is an SVG so you can just open and change it in an ordinary text editor. For a GUI, see Inkscape, which seems to be what the SVG was made in. --C. lorenz (talk) 11:42, 7 February 2010 (UTC)[reply]

I believe that the y'axis label $Y(E(X))$ should instead be $\varphi (E(X))$ .

Conditional expectation in Proof 3

Isn't it important to notice that the conditional expectation preserves order? I mean:

X\geq Y\Rightarrow \mathbb {E} \{X|{\mathfrak {G}}\}\geq \mathbb {E} \{Y|{\mathfrak {G}}\}.

The fact is not that obvious in my opinion. André Caldas (talk) 01:09, 4 August 2010 (UTC)[reply]

Reference Missing for Special Result

There is a special form of Jensen's inequality given for probability density functions f ('Form involving a probability density function'):

\varphi \left(\int _{-\infty }^{\infty }g(x)f(x)\,dx\right)\leq \int _{-\infty }^{\infty }\varphi (g(x))f(x)\,dx.

However, there is no proof or reference for this formula and it does not seem to be so easy to derive it from the standard form. Can someone please add a reference (or a short proof if possible). Thank you, --134.60.10.241 (talk) 10:50, 15 August 2011 (UTC)[reply]

Is it sufficient to set r.v. X to g(X) in the standard probabilistic form? Hupili (talk) 14:03, 12 March 2012 (UTC)[reply]

"Subdifferential" in proof 3

The use of the "subdifferential" in Proof 3 is problematic: First of all to make the two definitions (limit vs infimum) agree, the infimum must be restricted to $\theta >0$ . Second, $(D\varphi )(x)$ is not linear in $y$ : Consider for example the function defined by $f(x)=|x|$ for $x\geq 0$ and $f(x)=|x|/2$ for $x\leq 0$ . Then $(D\varphi )(0)(y)=f(y)$ . Why not simply take any subderivative and link to the corresponding article for existence? Xvlcw (talk) 09:38, 17 January 2013 (UTC)[reply]

removed comment in Finite form section

A previous version included this statement in parentheses in the Finite Form section: "the function log(x) is concave (note that we can use Jensen's to prove convexity or concavity, if it holds for two real numbers whose functions are taken)". This statment does not make sense. Jensen's inequality doesn't say the function is concave if and only if the inequality holds. The easiest way to prove that log(x) is concave is to observe that the second derivative is negative as described on the concave wikipedia page. Once you know it is concave, you can then apply Jensen's inequality. John Lawrence (talk) 16:27, 18 April 2013 (UTC)[reply]

Conditions for equality to hold

The statement: "the equality holds if and only if X is constant (degenerate random variable) or $φ$ is linear" is not correct (at least the "only if" part). For instance, if X has an exponential distribution and $φ$ =|X|, then equality will hold.

I have to think about it a little but I am almost sure that the correct statement is "... if and only if $φ$ is linear over a set A such that Pr_X(A)=1 (which is trivially true if X is constant).

Anyway, I feel like particular cases "X constant" and "linear function" are worth a reference, so I hear your opinions before any changes. AleNS (talk) 02:51, 15 February 2017 (UTC)[reply]

Converse (partial) to Jensen Inequality

We need a new section on (partial) converses to Jensen inequality Kjetil B Halvorsen 13:25, 10 October 2017 (UTC) — Preceding unsigned comment added by Kjetil1001 (talk • contribs)

Proof for the finite case is unnecessarily complicated

The result about the finite case does not specify that the weights sum up to 1 (and indeed, this wouldn't make the result any weaker). The proof does – if this requirement is taken out, it becomes even easier. One doesn't need the normalizing term \frac{\lambda_i}{1 - \lambda_1}. --109.192.165.115 (talk) 14:55, 1 January 2020 (UTC)[reply]

Edit: Nvm, I'm completely wrong, excuse me; the result does normalize them to add to 1 and the proof is fine. --109.192.165.115 (talk) 20:37, 4 January 2020 (UTC)[reply]

proof 3 problem

Factoring out $(D\varphi )(\operatorname {E} [X\mid {\mathfrak {G}}])$ from the conditional expectation in the second to last line doesn't seem justified. Though it is ${\mathfrak {G}}$ -measurable, it isn't integrable and neither its positive nor its negative parts seem integrable in general. 2600:8803:8711:F900:2CE5:8D62:8F6B:81E9 (talk) 19:30, 20 March 2023 (UTC)[reply]