A Variational Perspective On Gradient Descent

union

In this post I just want to share a simple and elegant idea from a Control System Letter paper written by Maxim Raginsky et al.1. This idea largely inspired my recent work on multi-agent learning in (monotone) games2, which I might devote another post to talk about if I happen to get some interesting results out of it.

The idea starts from here: we all know that the archetype of solving convex optimization problem is through gradient descent: $$ x_{k+1} = x_k - \nabla f(x_k), $$ which, in continuous time, corresponds to an autonomous dynamical system: $$ \dot{x} = - \nabla f (x) , \quad x(0) = x_0. $$ A natural question to ask is what are the hidden objectives being achieved along the gradient flow. This question was approached in a “inverse optimal control” fashion, i.e., given an autonomous dynamical system, identify the close-loop control and its corresponding optimal control problem. In this approach, the Fenchel-Young inequality played a crucial role to formulate the optimal control problem, as was extensively used in the variational principles introduced by Brezis and Ekeland3.

For any function \(f\) defined over primal space \( \mathcal{X}\), denote its Fenchel conjugate as \(f^* (y) = \sup_{x \in \mathcal{X}} \{ \langle x,y \rangle - f(x)\} \), we have the Fenchel coupling: $$ \mathcal{FC}_{f} (x, y) = f(x ) + f^*(y) - \langle x, y \rangle \geq 0 $$

with the equality holds if.f. $ y \in \partial f(x)$, or $x \in \partial f^* (y)$ if $f$ is convex.

Now we can try constructing the optimal control problem, we have: $$ \begin{align*} \inf_{ u \in \mathcal{U} } J(x_0) & \triangleq \int_{t=0}^{\infty} f(x(t)) + f^* (-u(t)) + \langle u(t), \bar{x} \rangle \mathrm{d}t \\ \text{s.t. } \quad \quad \quad & \dot{x} = u , \quad x(0) = x_0 \end{align*} $$ where $\bar{x} \in \arg\min_{x\in\mathcal{X} }f(x) $ is an optimal solution. Intuitively, solving this optimal control problem should also lead to an optimal solution to the original problem $\min_{x \in \mathcal{X}} f$. This can be actually verified by, let’s say, put $u^* (t)$ to be a close-loop control $-\nabla f(x(t))$, we know it eventually leads to $\bar{x}$, and by Fenchel-young inequality, $f(\bar{x}(t)) + f^* (-u^* (t)) + \langle u^* (t), \bar{x} \rangle = 0$, meaning that the stage cost of the optimal control problem also goes to $0$, we have enough reason to believe that $u^*(t)$ is actually the optimal control.

This result is nearly trivial as we just need to construct a Lyapunov function $V(x) = \frac{1}{2} \|x - \bar{x}\|^2$. We have, by Fenchel-Young inequality, \begin{equation} \dot{V} (x) + f(x) + f^* ( - u) + \langle \bar{x}, u \rangle \geq 0 , \end{equation} which simply imply that $V(x)$ is actually a value function and the optimal control should be whatever makes the equality holds, i.e., the close-loop control $ - \nabla f(x(t)) $. The benefit we can gain from this is the availablity of an entire analytical toolbag for differential equations, even stochastic differential equations if we consider the stochastic case.


  1. Tzen, B., Raj, A., Raginsky, M. and Bach, F., 2023. Variational principles for mirror descent and mirror langevin dynamics. IEEE Control Systems Letters. ↩︎

  2. Pan, Y., Li, T. and Zhu, Q., 2024. On the Variational Interpretation of Mirror Play in Monotone Games. arXiv preprint arXiv:2403.15636. ↩︎

  3. Brézis, H. and Ekeland, I., 1976. Un principe variationnel associéa certaines equations paraboliques. Le cas independant du temps. CR Acad. Sci. Paris Sér. AB, 282(17), pp.971-974. ↩︎