structure of (2). Unlike full gradient approaches which require computing and averaging n gradients
∇f(x) = (1/n)
P
n
i=1
∇f
i
(x) at every iteration, incremental techniques have a cost per-iteration
that is independent of n. The price to pay is the need to store a moderate amount of information
regarding past iterates, but the benefit is significant in terms of computational complexity.
Main contributions. Our main achievement is a generic acceleration scheme that applies to a
large class of optimization methods. By analogy with substances that increase chemical reaction
rates, we call our approach a “catalyst”. A method may be accelerated if it has linear conver-
gence rate for strongly convex problems. This is the case for full gradient [4, 19] and block coordi-
nate descent methods [18, 21], which already have well-known accelerated variants. More impor-
tantly, it also applies to incremental algorithms such as SAG [24], SAGA [6], Finito/MISO [7, 14],
SDCA [25], and SVRG [27]. Whether or not these methods could be accelerated was an important
open question. It was only known to be the case for dual coordinate ascent approaches such as
SDCA [26] or SDPC [28] for strongly convex objectives. Our work provides a universal positive an-
swer regardless of the strong convexity of the objective, which brings us to our second achievement.
Some approaches such as Finito/MISO, SDCA, or SVRG are only defined for strongly convex ob-
jectives. A classical trick to apply them to general convex functions is to add a small regularization
εkxk
2
[25]. The drawback of this strategy is that it requires choosing in advance the parameter ε,
which is related to the target accuracy. A consequence of our work is to automatically provide a
direct support for non-strongly convex objectives, thus removing the need of selecting ε beforehand.
Other contribution: Proximal MISO. The approach Finito/MISO, which was proposed in [7]
and [14], is an incremental technique for solving smooth unconstrained µ-strongly convex problems
when n is larger than a constant βL/µ (with β = 2 in [14]). In addition to providing acceleration
and support for non-strongly convex objectives, we also make the following specific contributions:
• we extend the method and its convergence proof to deal with the composite problem (1);
• we fix the method to remove the “big data condition” n ≥ βL/µ.
The resulting algorithm can be interpreted as a variant of proximal SDCA [25] with a different step
size and a more practical optimality certificate—that is, checking the optimality condition does not
require evaluating a dual objective. Our construction is indeed purely primal. Neither our proof of
convergence nor the algorithm use duality, while SDCA is originally a dual ascent technique.
Related work. The catalyst acceleration can be interpreted as a variant of the proximal point algo-
rithm [3, 9], which is a central concept in convex optimization, underlying augmented Lagrangian
approaches, and composite minimization schemes [5, 20]. The proximal point algorithm consists
of solving (1) by minimizing a sequence of auxiliary problems involving a quadratic regulariza-
tion term. In general, these auxiliary problems cannot be solved with perfect accuracy, and several
notations of inexactness were proposed, including [9, 10, 22]. The catalyst approach hinges upon
(i) an acceleration technique for the proximal point algorithm originally introduced in the pioneer
work [9]; (ii) a more practical inexactness criterion than those proposed in the past.
1
As a result, we
are able to control the rate of convergence for approximately solving the auxiliary problems with
an optimization method M. In turn, we are also able to obtain the computational complexity of the
global procedure for solving (1), which was not possible with previous analysis [9, 10, 22]. When
instantiated in different first-order optimization settings, our analysis yields systematic acceleration.
Beyond [9], several works have inspired this paper. In particular, accelerated SDCA [26] is an
instance of an inexact accelerated proximal point algorithm, even though this was not explicitly
stated in [26]. Their proof of convergence relies on different tools than ours. Specifically, we use the
concept of estimate sequence from Nesterov [17], whereas the direct proof of [26], in the context
of SDCA, does not extend to non-strongly convex objectives. Nevertheless, part of their analysis
proves to be helpful to obtain our main results. Another useful methodological contribution was the
convergence analysis of inexact proximal gradient methods of [23]. Finally, similar ideas appear in
the independent work [8]. Their results overlap in part with ours, but both papers adopt different
directions. Our analysis is for instance more general and provides support for non-strongly convex
objectives. Another independent work with related results is [13], which introduce an accelerated
method for the minimization of finite sums, which is not based on the proximal point algorithm.
1
Note that our inexact criterion was also studied, among others, in [22], but the analysis of [22] led to the
conjecture that this criterion was too weak to warrant acceleration. Our analysis refutes this conjecture.
2