(Fast) Proximal Gradient Methods
Michael Winkler
June 28, 2022, 2:15 p.m. S5 101We consider an optimization problem in an additively composed form, i.e., ${\underset{x \in E}{\min}\, f(x) + g(x)}$, where $E$ is a Euclidean space, $f$ is smooth and $g$ is convex. A common example of these types of problems is the so-called LASSO problem stemming from linear regression analysis in statistics and machine learning. Due to $g$ not being smooth, classical 1st order methods are not applicable for solving these problems. One different approach is the Proximal Gradient Method which exploits the convexity of $g$ instead and consists of a gradient descent w.r.t. $f$ and a so-called proximal mapping w.r.t. $g$ applied to the result of the latter. For $g \equiv \lambda \|\cdot\|_1$, this method yields the well-known ISTA (Iterative Soft Thresholding Algorithm). We present the theoretical foundation of the Proximal Gradient Method involving subdifferentiability and the proximal mapping. The convergence of the method in terms of function values admits further a so-called sublinear rate, i.e., $\mathcal{O}(\frac{1}{k})$, which can be improved to $\mathcal{O}(\frac{1}{k^2})$ by the Fast Proximal Gradient Method resp. FISTA. Several variants of the method with different underlying assumptions are presented and implemented for solving a specific LASSO problem originating from data classification.