This post is in celebration of Michel Talagrand winning Abel prize. Not to overly romanticize this but this is pretty much a come back story because back in the days of last century, “The type of mathematics I do was not fashionable at all when I started. It was considered inferior mathematics.” –Michel Talagrand.
Now we’ve seen significance of his contribution to the concentration of measure, suprema of stochastic processes and spin glass, all partially owing to his celebrated isoperimetry inequality in product probability space. What is an isoperimetry inequality? It is a concept in mathematics, particularly in the field of geometry and geometric analysis, that compares the length (or perimeter) of a closed curve to the area of the region it encloses, establishing that among all shapes with the same perimeter, the circle has the maximum area. For example: $$ 4 \pi A \leq L^2 $$ where the equality holds if.f. the curve is a circle. In a more general sense, the isoperimetric inequality relates the volume of an $n$-dimensional domain to the surface area of its boundary, with the sphere in $n$-dimensional space providing the optimal (maximum volume for a given surface area) ratio. Applying this concept to probability space, Talagrand was able to prove the following:
Theorem
For a product probability space $\Omega = \prod_{i=1}^n \Omega_i$ endowed with a produdct measure and $A \subseteq \Omega$ $$ \mathbb{P}[A] \cdot \mathbb{P}\left[A_{t}^{c}\right] \leq e^{-t^{2} / 4} $$ for any $t > 0$, where $A^c_t = { w \in \Omega, d(A, \omega) \leq t }$ is an event defined by convex distance: $$ d(A, \omega) = \max_{\alpha, |\alpha|_2 \leq 1} \min_{y \in A} \sum_{ i: \omega_i \neq y_i} \alpha_i. $$
To prove the inequality we need a little bit of preparation. Given set $A$, $x \in \Omega$: $\mathcal{D}_{A}^{c}(x)=\sup_{a \in \mathcal{R}_{+}^{n}} \left( d_{a}(x, A)=\inf_{y \in A} d_{a}(x, y) \right)$. Let $$ V_{A}(x) = \text{ Convex-hull }\left( U_{A}(x) \right) = \left\{\sum_{s \in U_{A} (x) } \alpha_{s} S: \sum \alpha_{s}=1, \alpha_{s} \geq 0 \text{ for all } s \in U_{A}(x) \right\}. $$ Thus, $$ x \in A \Leftrightarrow \mathbf{1}(x \neq x)=0 \in U_{A}(x) \Leftrightarrow 0 \in V_{A}(x). $$
Lemma
We have the following $$ \mathcal{D}_{A}^{c}(x)=d \left(0, V_{A}(x) \right) \equiv \inf_{y \in V_{A}(x)}|y| $$
Proof
We proceed by proving (i) $\mathcal{D}_{A}^{c}(x) \leq \inf_{y \in V_{A}(x)}|y|$ and (ii) $\mathcal{D}_{A}^{c}(x) \geq \inf_{y \in V_{A}(x)}|y|$.
(i): since $\inf_{y \in V_{A}(x)}|y|$ is achieved, let $Z$ be such that $|Z| = \inf_{y \in V_{A}(x)}|y| $. For any $a \in \mathbb{R}^n_+$ $|a| = 1$: $$ \inf_{y \in V_{A}(x)} a \cdot y \leq a \cdot z \leq|a||z|=|z|. $$ Since $ \inf_{y \in V_{A}(x)} a \cdot y$ is linear programming, the minimum is achieved at an extreme point. That is, there exists $s \in U_A(x)$ such that $$ \inf_{y \in V_{A}(x)} a \cdot y=\inf_{s \in U_{A}(x)} a \cdot s=\inf_{y \in A} d_{a}(x, y) \text { for some } y \in A . $$which is true for all $$ \sup_{|a|=1, a \in \mathbb{R}_{+}^{n}} \inf_{y \in A} d_{a}(x, y) \leq|z| \equiv \inf_{y \in V_{A}(x)}|y|. $$
(ii): Let $z$ be the one achieving minimum in $V_A(x)$. Then due to convexity of the objective (equivalently $|y|^2 = \sum y^2_i = f(y)$) and of the domain, we have for any $ y \in V_A (x), \langle \nabla f(z), y -z \rangle \geq 0$ for any $y \in V_A(x)$. $\nabla f(z) = \nabla (z \dot z) = 2 z$. Therefore the condition implies: $$ (y-z) z \geq 0 \Leftrightarrow y \cdot z \geq z \cdot z=|z|^{2} \Rightarrow y \cdot \frac{z}{|z|} \geq|z|. $$
Thus, for $a = \frac{z}{|z|} \in \mathbb{R}^n_+ $, $|a| = 1$, we have that $$ \inf_{y \in V_A(x)} a \dot y \geq |z| . $$ But for any given $a$, $\inf_{y \in V_A(x)} a \cdot y = \inf_{s \in U_A (x)} a \cdot s = d_a (x,A)$ as explained before. That is, $\sup_{a: |a| = 1} d_a (x, A) \geq |z| = \inf_{ y \in V_A(x)} |y|$.
$\quad\quad\quad \quad\quad\quad \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad \quad\quad\quad \quad\quad\quad \quad\quad\quad \quad\quad\quad \quad\quad\quad \square$
Proof of Isoperimetry inequality
We prove by induction. Consider $n=1$, given $A$, $$ \mathcal{D}_{A}^{c}(x)=\sup_{a \in \mathbb{R}_{+}^{n},|a|=1} \inf_{y \in A} d_{a}(x, y)=\inf_{y \in A} \mathbf{1}(x \neq y)=\left\{ \begin{array}{ll}0, & \text{ for } x \in A \\ 1, & \text { for } x \notin A \end{array} \right. $$
Then $$ \begin{aligned} \int \exp \left (D^{2} / 4\right ) d P & =\int_{A} \exp (0) d P+ \int_{A^{c}} \exp (1 / 4) d P \\ & =P(A)+e^{1 / 4}(1-P(A)) \\ & =e^{1 / 4}-\left(e^{1 / 4}-1 \right ) P(A) \\ & \leq \frac{1}{P(A)}. \end{aligned} $$
Let $f(x) = e^{1/4} - (e^{1/4} - 1) x$ and $g(x)= \frac{1}{x}$. Because $f(x)$ is a decreasing function of $x$, $g(x)$ is a decreasing convex function, hence the result holds for $n=1$.
Now let’s assume it holds for $n$, let $A \subset \Omega^{n+1}$, and $B$ be its projections on $\Omega^n$. Let $A(\omega)$ be section of $A$ along $\omega$: if $x \in \Omega^n$, $\omega \in \Omega$, then $z = (x, \omega) \in \Omega^{n+1}$. The key observation is that:
- if $s \in U_{A(\omega)} (x)$, then $(s, 0) \in U_{A}(z)$
- if $t \in U_{B}(x)$, then $(t, 1) \in U_{A}(z)$.
- if $\xi \in V_{A(\omega)}(x), \zeta \in V_{B}(x)$, and $\theta \in [0,1]$, then $((\theta \xi+(1-\theta) \zeta), 1-\theta) \in V_{A}(z)$. Recall $$ \begin{aligned} \mathcal{D}_{A}^{c}(z)^{2} & =\inf_{y \in V_{A}(z)}|y|^{2} \leq(1-\theta)^{2}+|\theta \xi+(1-\theta) \zeta|^{2} \\ & \leq(1-\theta)^{2}+\theta|\xi|^{2}+(1-\theta)|\zeta|^{2} \\ & \leq(1-\theta)^{2}+\theta \inf_{\xi \in V_{A(\omega)}(x)}|\xi|^{2}+(1-\theta) \inf_{\zeta \in V_{B}(x)}|\zeta|^{2} \\ & =(1-\theta)^{2}+\theta \mathcal{D}_{A(\omega)}^{c}(x)^{2}+(1-\theta) \mathcal{D}_{B}^{c}(x)^{2} \end{aligned} $$
By Holder’s inequality, and the induction hypothesis, $\forall \omega \in \Omega$, $$ \begin{aligned} & \int_{\Omega^{n}} e^{\mathcal{D}_{A}^{c}(x, \omega)^{2} / 4} d P(x) \\ & \leq \int_{\Omega^{n}} \exp \left(\frac{(1-\theta)^{2}+\theta \mathcal{D}_{A(\omega)}^{c}(x)^{2}+(1-\theta) \mathcal{D}_{B}^{c}(x)}{4} \right) d P(x) \\ & \leq \exp \left ( \frac{(1-\theta)^{2}}{4} \right ) \int_{\Omega^{n}} \underbrace{\exp \left(\frac{\theta \mathcal{D}_{A(\omega)}^{c}(x)^{2}}{4}\right)}_{X} \underbrace{\exp \left(\frac{(1-\theta) \mathcal{D}_{B}^{c}(x)^{2}}{4}\right)}_{Y} d P(x) \\ & =\exp \left(\frac{(1-\theta)^{2}}{4}\right) \mathbb{E}[X \cdot Y] \\ & \leq \exp \left(\frac{(1-\theta)^{2}}{4}\right) \mathbb{E}\left[X^{p}\right]^{1 / p} \mathbb{E}\left[Y^{q}\right]^{1 / q},\left(\text{ for } p=\frac{1}{\theta}, q=\frac{1}{1-\theta}: \theta \in[0,1]\right) \\ & =\exp \left ( \frac{(1-\theta)^{2}}{4} \right )\left(\int_{\Omega^{n}} \exp \left(\mathcal{D}_{A(\omega)}^{c}(x)^{2} / 4\right) d P(x)\right)^{\theta}\left(\int_{\Omega^{n}} \exp \left(\mathcal{D}_{B}^{c}(x)^{2} / 4\right) d P(x)\right)^{1-\theta} \\ & \leq \exp \left(\frac{(1-\theta)^{2}}{4}\right)\left(\frac{1}{P(A(\omega))}\right)^{\theta}\left(\frac{1}{P(B)}\right)^{1-\theta} \text{ by induction hypothesis. } \\ & =\exp \left ( \frac{(1-\theta)^{2}}{4} \right ) \frac{1}{P(B)}\left(\frac{P(A(\omega))}{P(B)}\right)^{-\theta} . \end{aligned} $$
Now we optimize $\theta$, simply by construction: for any $ u \in [0,1]$ $\inf_{\theta \in[0,1]} \exp \left(\frac{(1-\theta)^{2}}{4}\right) u^{-\theta} \leq 2-u$. Therefore, the R.H.S. $$ \leq \frac{1}{P(B)}\left(2-\frac{P(A(\omega))}{P(B)}\right) , $$ and thus $$ \begin{aligned} & \int_{\Omega^{n+1}} \exp \left(\frac{\mathcal{D}_{A}^{c}(x, \omega)^{2}}{4}\right) d P(x) d \mu(\omega) \\ & \leq \frac{1}{\mathbb{P}(B)} \int_{\Omega}\left(2-\frac{\mathbb{P}(A(\omega))}{\mathbb{P}(B)}\right) d \mu(\omega) \\ & \leq \frac{1}{\mathbb{P}(B)}\left(2-\frac{(P \bigotimes \mu)(A)}{\mathbb{P}(B)}\right) \\ & \leq \frac{1}{(\mathbb{P} \bigotimes \mu)(A)},(\text{ since } u(2-u) \leq 1 \text{ for all } u \in \mathbb{R}) . \end{aligned} $$
$ \quad \quad \quad\quad \quad \quad\quad \quad \quad\quad \quad \quad\quad \quad \quad\quad \quad \quad\quad \quad \quad\quad \quad \quad \quad \quad \quad\quad \quad \quad\quad \quad \quad\quad \quad \quad\quad \square$