Lin, Mairal and Harchaoui
Boyd, 2014). The proximal point algorithm consists of solving (1) by minimizing a sequence
of auxiliary problems involving a quadratic regularization term. In general, these auxiliary
problems cannot be solved with perfect accuracy, and several notions of inexactness were
proposed by G¨uler (1992); He and Yuan (2012) and Salzo and Villa (2012). The Cata-
lyst approach hinges upon (i) an acceleration technique for the proximal point algorithm
originally introduced in the pioneer work of G¨uler (1992); (ii) a more practical inexact-
ness criterion than those proposed in the past.
1
As a result, we are able to control the
rate of convergence for approximately solving the auxiliary problems with an optimization
method M. In turn, we are also able to obtain the computational complexity of the global
procedure, which was not possible with previous analysis (G¨uler, 1992; He and Yuan, 2012;
Salzo and Villa, 2012). When instantiated in different first-order optimization settings, our
analysis yields systematic acceleration.
Beyond G¨uler (1992), several works have inspired this work. In particular, accelerated
SDCA (Shalev-Shwartz and Zhang, 2016) is an instance of an inexact accelerated proximal
point algorithm, even though this was not explicitly stated in the original paper. Catalyst
can be seen as a generalization of their algorithm, originally designed for stochastic dual
coordinate ascent approaches. Yet their proof of convergence relies on different tools than
ours. Specifically, we introduce an approximate sufficient descent condition, which, when
satisfied, grants acceleration to any optimization method, whereas the direct proof of Shalev-
Shwartz and Zhang (2016), in the context of SDCA, does not extend to non-strongly convex
objectives. Another useful methodological contribution was the convergence analysis of
inexact proximal gradient methods of Schmidt et al. (2011) and Devolder et al. (2014).
Finally, similar ideas appeared in the independent work (Frostig et al., 2015). Their results
partially overlap with ours, but the two papers adopt rather different directions. Our
analysis is more general, covering both strongly-convex and non-strongly convex objectives,
and comprises several variants including an almost parameter-free variant.
Then, beyond accelerated SDCA (Shalev-Shwartz and Zhang, 2016), other accelerated
incremental methods have been proposed, such as APCG (Lin et al., 2015b), SDPC (Zhang
and Xiao, 2015), RPDG (Lan and Zhou, 2017), Point-SAGA (Defazio, 2016) and Katyusha
(Allen-Zhu, 2017). Their techniques are algorithm-specific and cannot be directly general-
ized into a unified scheme. However, we should mention that the complexity obtained by
applying Catalyst acceleration to incremental methods matches the optimal bound up to a
logarithmic factor, which may be the price to pay for a generic acceleration scheme.
A related recent line of work has also combined smoothing techniques with outer-loop
algorithms such as Quasi-Newton methods (Themelis et al., 2016; Giselsson and F¨alt, 2016).
Their purpose was not to accelerate existing techniques, but rather to derive new algorithms
for nonsmooth optimization.
To conclude this survey, we mention the broad family of extrapolation methods (Sidi,
2017), which allow one to extrapolate to the limit sequences generated by iterative al-
gorithms for various numerical analysis problems. Scieur et al. (2016) proposed such an
approach for convex optimization problems with smooth and strongly convex objectives.
The approach we present here allows us to obtain global complexity bounds for strongly
1. Note that our inexact criterion was also studied, among others, by Salzo and Villa (2012), but their
analysis led to the conjecture that this criterion was too weak to warrant acceleration. Our analysis
refutes this conjecture.
4