Courtney Paquette, Hongzhou Lin, Dmitriy Drusvyatskiy, Julien Mairal, Zaid Harchaoui
References
[1]
Z. Allen-Zhu. Natasha: Faster non-convex stochas-
tic optimization via strongly non-convex parame-
ter. In International conference on machine learn-
ing (ICML), 2017.
[2]
Z. Allen-Zhu and E. Hazan. Variance reduction for
faster non-convex optimization. In International
conference on machine learning (ICML), 2016.
[3]
J. M. Borwein and A. S. Lewis. Convex analysis
and nonlinear optimization: theory and examples.
Springer Verlag, 2006.
[4]
Y. Carmon, J. C. Duchi, O. Hinder, and A. Sidford.
Accelerated methods for non-convex optimization.
preprint arXiv:1611.00756, 2016.
[5]
Y. Carmon, J. C. Duchi, O. Hinder, and A. Sid-
ford. Lower bounds for finding stationary points
I. preprint arXiv:1710.11606, 2017.
[6]
Y. Carmon, O. Hinder, J. C. Duchi, and A. Sid-
ford. “convex until proven guilty”: Dimension-free
acceleration of gradient descent on non-convex
functions. In International conference on machine
learning (ICML), 2017.
[7]
C. Cartis, N. I. M. Gould, and P. L. Toint. On
the complexity of steepest descent, newton’s and
regularized newton’s methods for nonconvex un-
constrained optimization problems. SIAM Journal
on Optimization, 20(6):2833–2852, 2010.
[8]
C. Cartis, N.I.M. Gould, and P. L. Toint. On the
complexity of finding first-order critical points in
constrained nonlinear optimization. Mathematical
Programming, Series A, 144:93–106, 2014.
[9]
A. J. Defazio, F. Bach, and S. Lacoste-Julien.
SAGA: A fast incremental gradient method with
support for non-strongly convex composite objec-
tives. In Advances in Neural Information Process-
ing Systems (NIPS), 2014.
[10]
D. Drusvyatskiy and C. Paquette. Efficiency of
minimizing compositions of convex functions and
smooth maps. preprint arXiv:1605.00125, 2016.
[11]
J. C. Duchi, E. Hazan, and Y. Singer. Adap-
tive subgradient methods for online learning and
stochastic optimization. Journal of Machine
Learning Research (JMLR), 12:2121–2159, 2011.
[12]
S. Ghadimi and G. Lan. Accelerated gradient
methods for nonconvex nonlinear and stochastic
programming. Mathematical Programming, 156(1-
2, Ser. A):59–99, 2016.
[13]
S. Ghadimi, G. Lan, and H. Zhang. Generalized
uniformly optimal methods for nonlinear program-
ming. preprint arXiv:1508.07384, 2015.
[14]
T. Hastie, R. Tibshirani, and M. Wainwright. Sta-
tistical learning with sparsity: the Lasso and gen-
eralizations. CRC Press, 2015.
[15]
C. Jin, P. Netrapalli, and M. I. Jordan. Accelerated
gradient descent escapes saddle points faster than
gradient descent. preprint arXiv:1711.10456, 2017.
[16]
R. Johnson and T. Zhang. Accelerating stochastic
gradient descent using predictive variance reduc-
tion. In Advances in Neural Information Process-
ing Systems (NIPS), 2013.
[17]
Diederik P Kingma and Jimmy Ba. Adam: A
method for stochastic optimization. International
Conference on Learning Representations (ICLR),
2015.
[18]
G. Lan and Y. Zhou. An optimal randomized
incremental gradient method. Mathematical Pro-
gramming, Series A, pages 1–38, 2017.
[19]
L. Lei and M. I. Jordan. Less than a single
pass: stochastically controlled stochastic gradient
method. In Conference on Artificial Intelligence
and Statistics (AISTATS), 2017.
[20]
L. Lei, C. Ju, J. Chen, and M. I. Jordan. Non-
convex finite-sum optimization via SCSG meth-
ods. In Advances in Neural Information Processing
Systems (NIPS), 2017.
[21]
H. Li and Z. Lin. Accelerated proximal gradient
methods for nonconvex programming. In Advances
in Neural Information Processing Systems (NIPS).
2015.
[22]
H. Lin, J. Mairal, and Z. Harchaoui. A universal
catalyst for first-order optimization. In Advances
in Neural Information Processing Systems (NIPS),
2015.
[23]
J. Mairal. Incremental majorization-minimization
optimization with application to large-scale ma-
chine learning. SIAM Journal on Optimization,
25(2):829–855, 2015.
[24]
J. Mairal, F. Bach, and J. Ponce. Sparse modeling
for image and vision processing. Foundations and
Trends in Computer Graphics and Vision, 8(2-
3):85–283, 2014.
[25]
Y. Nesterov. A method of solving a convex pro-
gramming problem with convergence rate
O
(1/
k
2
).
Soviet Mathematics Doklady, 27(2):372–376, 1983.
[26]
Y. Nesterov. Introductory lectures on convex opti-
mization: a basic course. Springer, 2004.
[27]
M. O’Neill and S. J. Wright. Behavior of accel-
erated gradient methods near critical points of
nonconvex problems. preprint arXiv:1706.07993,
2017.