Variational Calculus
Introduction
Variational calculus is a field of mathematics which is based on the idea of using variation to optimize functionals.
Functionals Theory
A functional is defined by a rule, which associates a number (real or complex) with a function of one or several variables,
\[f(r1,...) \xrightarrow{rule} F[ f ],\]or, more generally, which associates a number with a set of functions,
\[f1, f2,... \xrightarrow{rule} F[ f1, f2,...]\]In such a case, functional can be defined as “function of a function”. An example for a functional is a definite integral over a continuous function.
Calculus of Variations
This field is concerned with the maxima or minima of functionals. The extrema of functionals may be obtained by finding functions for which the functional derivative is equal to zero.
A very common problem is the problem of finding the curve of shortest length between two given points. With no constraints, the answer is simple: a straight line. But in case of constraints on curve, the solution is not that straightforward. Geodesics represent these solutions.
Consider a functional \(J[y(x)]\) of \(y(x)\),
\[J[y] = \int_{x_1}^{x_2} L(x, y(x), y'(x), ..., y^{(n)}(x)) dx\]where \(L\) is the integrand of the functional. Now, variation of the functional is the amount the functional changes when the input function is changed by a little bit. Suppose we let \(y(x) \xrightarrow{} y(x) + \delta y(x)\), then
\[y^{(n)}(x) \xrightarrow{} y^{(n)}(x) + \frac{\mathrm{d^n}}{\mathrm{d}x^n}\delta y(x) = y^{(n)}(x) + \delta y^{(n)}(x)\](assuming that the integrand \(L\) is continuously differentiable) Now,
\[J[y + \delta y] = \int_{x_1}^{x_2} L(x, y + \delta y, y' + \delta y', ..., y^{(n)} + \delta y^{(n)}) dx\]Using first-order Taylor expansion for a multivariable function, we get:
\[J[y + \delta y] = \int_{x_1}^{x_2} \Biggl( L + \frac{\partial L}{\partial x}dx + \frac{\partial L}{\partial y}\delta y + \frac{\partial L}{\partial y'}\delta y' + ... + \frac{\partial^{(n)} L}{\partial y^{(n)}}\delta y^{(n)} \Biggr) dx\]To find the variation for this functional, we calculate:
\[\delta J = J[y + \delta y] - J[y]\] \[\qquad \qquad = \int_{x_1}^{x_2} \Biggl( \frac{\partial L}{\partial x}dx + \frac{\partial L}{\partial y}\delta y + \frac{\partial L}{\partial y'}\delta y' + ... + \frac{\partial^{(n)} L}{\partial y^{(n)}}\delta y^{(n)} \Biggr) dx\]Applying integration by parts repeatedly we get,
\[\begin{align*} \delta J = \frac{d}{dx}(\frac{\partial L}{\partial y'})\delta y(x)|_{x_1}^{x_2} + \frac{d}{dx}(\frac{\partial L}{\partial y''})\delta y'(x)|_{x_1}^{x_2} - \frac{d^2}{dx^2}(\frac{\partial L}{\partial y''})\delta y(x)|_{x_1}^{x_2} \\ + ... + (-1)^{n-1}\frac{d^n}{dx^n}(\frac{\partial L}{\partial y^{(n)}})\delta y^{(n)}(x)|_{x_1}^{x_2} + \int_{x_1}^{x_2} \frac{\partial L}{\partial x} dx + \\ \int_{x_1}^{x_2} \Biggl( \frac{\partial L}{\partial y} - \frac{d}{dx}\frac{\partial L}{\partial y'} + \frac{d^2}{dx^2}\frac{\partial L}{\partial y''} + ... + (-1)^{n-1}\frac{d^n}{dx^n}\frac{\partial L}{\partial y^{(n)}} \Biggr)\delta y(x)dx \end{align*}\]The above is the variation of the functional \(J\). One of the practical examples is the problem of lower bounding the marginal likelihood using variation.