(Fast) Proximal Gradient Methods

Michael Winkler

June 28, 2022, 4:15 p.m. S5 101

We consider an optimization problem in an additively composed form, i.e., ${\underset{x \in E}{\min}\, f(x) + g(x)}$, where $E$ is a Euclidean space, $f$ is smooth and $g$ is convex. A common example of these types of problems is the so-called LASSO problem stemming from linear regression analysis in statistics and machine learning. Due to $g$ not being smooth, classical 1st order methods are not applicable for solving these problems. One different approach is the Proximal Gradient Method which exploits the convexity of $g$ instead and consists of a gradient descent w.r.t. $f$ and a so-called proximal mapping w.r.t. $g$ applied to the result of the latter. For $g \equiv \lambda \|\cdot\|_1$, this method yields the well-known ISTA (\textbf{I}terative \textbf{S}oft \textbf{T}hresholding \textbf{A}lgorithm). We present the theoretical foundation of the Proximal Gradient Method involving subdifferentiability and the proximal mapping. The convergence of the method in terms of function values admits further a so-called sublinear rate, i.e., $\mathcal{O}(\frac{1}{k})$, which can be improved to $\mathcal{O}(\frac{1}{k^2})$ by the Fast Proximal Gradient Method resp. FISTA. Several variants of the method with different underlying assumptions are presented and implemented for solving a specific LASSO problem originating from data classification.