JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 14
[93]
S. S. Ram, A. Nedi
´
c, and V. V. Veeravalli, “Distributed stochastic
subgradient projection algorithms for convex optimization,” Journal of
Optimization Theory and Applications, vol. 147, pp. 516–545, 2010.
[94]
A. Mokhtari and A. Ribeiro, “DSA: Decenrtalized double stochastic
averaging gradient algorithm,” Journal of Machine Learning Research,
vol. 17, no. 61, pp. 1–35, 2016.
[95]
R. Xin, U. A. Khan, and S. Kar, “Variance-reduced decentralized
stochastic optimization with accelerated convergence,” preprint arX-
iv:1912.04230, 2019.
[96]
B. Li, S. Cen, Y. Chen, and Y. Chi, “Communication-efficient dis-
tributed optimization in networks with gradient tracking,” preprint
arXiv:1909.05844 (to appear in Journal of Machine Learning Research),
2020.
[97]
R. Xin, S. Kar, and U. A. Khan, “Decentralized stochastic optimization
and machine learning: A unified variance-reduction framework for
robust performance and fast convergence,” IEEE Signal Processing
Magazine, vol. 37, no. 3, pp. 102–113, 2020.
[98]
Z. Lu and L. Xiao, “On the complexity analysis of randomized block-
coordinate descent methods,” Mathematical Programming, vol. 152,
pp. 615–642, 2015.
[99]
Y. T. Lee and A. Sidford, “Efficient accelerated coordinate descent
methods and faster algorithms for solving linear systems,” in IEEE
Symposium on Foundations of Computer Science (FOCS), pp. 147–156,
2013.
[100]
S. Shalev-Shwartz and T. Zhang, “Stochastic dual coordinate ascent
methods for regularized loss minimization,” Journal of Machine
Learning Research, vol. 14, pp. 567–599, 2013.
[101]
I. Necoara, Y. Nesterov, and F. Glineur, “Linear convergence of first
order methods for non-strongly convex optimization,” Mathematical
Programming, vol. 175, pp. 69–107, 2019.
[102]
P. W. Wang and C. J. Lin, “Iteration complexity of feasible descent
methods for convex optimization,” Journal of Machine Learning
Research, vol. 15, pp. 1523–1548, 2014.
[103]
B. O’Donoghue and E. Cand
`
es, “Adaptive restart for accelerated gradient
schemes,” Foundations of Computational Mathematics, vol. 15, pp. 715–
732, 2015.
[104]
O. Fercoq and Z. Qu, “Adaptive restart of accelerated gradient methods
under local quadratic growth condition,” IMA Journal of Numerical
Analysis, vol. 39, no. 4, pp. 2069–2095, 2019.
[105]
O. Fercoq and Z. Qu, “Restarting the accelerated coordinate descent
method with a rough strong convexity estimate,” Computational
Optimization and Applications, vol. 75, pp. 63–91, 2020.
[106]
Z. Qu, P. Richt
´
arik, and T. Zhang, “Quartz: Randomized dual coordinate
ascent with arbitrary sampling,” in Advances in Neural Information
Processing Systems (NIPS), pp. 865–873, 2015.
[107]
Z. Allen-Zhu, Z. Qu, P. Richt
´
arik, and Y. Yuan, “Even faster accelerated
coordinate descent using non-uniform sampling,” in International
Conference on Machine Learning (ICML), pp. 1110–1119, 2016.
[108]
Y. Nesterov and S. U. Stich, “Efficiency of the accelerated coordinate
descent method on structured optimization problems,” SIAM Journal
on Optimization, vol. 27, no. 1, pp. 110–123, 2017.
[109]
P. Ochs, Y. Chen, T. Brox, and T. Pock, “iPiano: Inertial proximal
algorithm for nonconvex optimization,” SIAM Journal on Imaging
Sciences, vol. 7, no. 2, pp. 1388–1419, 2014.
[110]
S. Ghadimi and G. Lan, “Accelerated gradient methods for nonconvex
nonlinear and stochastic programming,” Mathematical Programming,
vol. 156, pp. 59–99, 2016.
[111]
H. Li and Z. Lin, “Accelerated proximal gradient methods for nonconvex
programming,” in Advances in Neural Information Processing Systems
(NIPS), pp. 379–387, 2015.
[112]
A. Beck and M. Teboulle, “Fast gradient-based algorithms for con-
strained total variation image denoising and deblurring problems,” IEEE
Transactions on Image Processing, vol. 18, no. 11, pp. 2419–2434,
2009.
[113]
C. Jin, R. Ge, P. Netrapalli, S. M. Kakade, and M. I. Jordan, “How
to escape saddle points efficiently,” in International Conference on
Machine Learning (ICML), pp. 1724–1732, 2017.
[114]
J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht, “Gradient
descent only converges to minimizers,” in Conference On Learning
Theory (COLT), pp. 1246–1257, 2016.
[115]
S. S. Du, C. Jin, J. D. Lee, M. I. Jordan, B. Poczos, and A. Singh,
“Gradient descent can take exponential time to escape saddle points,” in
Advances in Neural Information Processing Systems (NIPS), pp. 1067–
1077, 2017.
[116] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance
of initialization and momentum in deep learning,” in International
Conference on Machine Learning (ICML), pp. 1139–1147, 2013.
[117]
Z. Allen-Zhu and E. Hazan, “Variance reduction for faster non-convex
optimization,” in International Conference on Machine Learning (ICML),
pp. 699–707, 2016.
[118]
S. J. Reddi, A. Hefny, S. Sra, B. P
´
ocz
´
os, and A. Smola, “Stochas-
tic variance reduction for nonconvex optimization,” in International
Conference on Machine Learning (ICML), pp. 314–323, 2016.
[119]
L. Lei, C. Ju, J. Chen, and M. I. Jordan, “Non-convex finite-sum
optimization via SCSG methods,” in Advances in Neural Information
Processing Systems (NIPS), pp. 2348–2358, 2017.
[120]
S. J. Reddi, S. Sra, B. P
´
ocz
´
os, and A. Smola, “Proximal stochastic meth-
ods for nonsmooth nonconvex finite-sum optimization,” in Advances in
Neural Information Processing Systems (NIPS), pp. 1145–1153, 2016.
[121]
D. Zhou, P. Xu, and Q. Gu, “Stochastic nested variance reduction for
nonconvex optimization,” in Advances in Neural Information Processing
Systems (NeurIPS), pp. 3925–3936, 2018.
[122]
Z. Wang, K. Ji, Y. Zhou, Y. Liang, and V. Tarokh, “SpiderBoost
and momentum: Faster stochastic variance reduction algorithms,”
in Advances in Neural Information Processing Systems (NeurIPS),
pp. 2403–2413, 2019.
[123]
L. M. Nguyen, M. van Dijk, D. T. Phan, P. H. Nguyen, T.-W. Weng,
and J. R. Kalagnanam, “Finite-sum smooth optimization with SARAH,”
preprint arXiv:1901.07648, 2019.
[124]
Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, N. Srebro, and
B. Woodworth, “Lower bounds for non-convex stochastic optimization,”
preprint arXiv:1912.02365, 2019.
[125]
R. Ge, F. Huang, C. Jin, and Y. Yuan, “Escaping from saddle points –
online stochastic gradient for tensor decomposition,” in Conference On
Learning Theory (COLT), pp. 797–842, 2015.
[126]
H. Daneshmand, J. Kohler, A. Lucchi, and T. Hofmann, “Escaping
saddles with stochastic gradients,” in International Conference on
Machine Learning (ICML), pp. 1155–1164, 2018.
[127]
C. Jin, P. Netrapalli, R. Ge, S. M. Kakade, and M. I. Jordan, “Stochastic
gradient descent escapes saddle points efficiently,” arXiv:1902.04811,
2019.
[128]
C. Fang, Z. Lin, and T. Zhang, “Sharp analysis for nonconvex SGD
escaping from saddle points,” in Conference On Learning Theory
(COLT), pp. 1192–1234, 2019.
[129]
Y. Xu, R. Jin, and T. Yang, “First-order stochastic algorithms for
escaping from saddle points in almost linear time,” in Advances in
Neural Information Processing Systems (NeurIPS), pp. 5530–5540,
2018.
[130]
Z. Allen-Zhu and Y. Li, “Neon2: Finding local minima via first-
order oracles,” in Advances in Neural Information Processing Systems
(NeurIPS), pp. 3716–3726, 2018.
[131]
Y. Nesterov and B. T. Polyak, “Cubic regularization of Newton method
and its global performance,” Mathematical Programming, vol. 108,
pp. 177–205, 2006.
[132]
K. Scaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massouli
´
e, “Optimal
algorithms for smooth and strongly convex distributed optimization in
networks,” in International Conference on Machine Learning (ICML),
pp. 3027–3036, 2017.
[133]
C. A. Uribe, S. Lee, A. Gasnikov, and A. Nedi
´
c, “A dual approach
for optimal algorithms in distributed optimization over networks,”
arXiv:1809.00710, 2018.
[134]
H. Hendrikx, F. Bach, and L. Massouli
´
e, “An accelerated decentralized
stochastic proximal algorithm for finite sums,” in Advances in Neural
Information Processing Systems (NeurIPS), pp. 952–962, 2019.
[135]
D. Jakoveti
´
c, J. Xavier, and J. M. F. Moura, “Fast distributed gradient
methods,” IEEE Transactions on Automatic Control, vol. 59, no. 5,
pp. 1131–1146, 2014.
[136]
H. Li, C. Fang, W. Yin, and Z. Lin, “A sharp convergence rate analysis
for distributed accelerated gradient methods,” arXiv:1810.01053, 2018.
[137]
H. Li and Z. Lin, “Revisiting EXTRA for smooth distributed opti-
mization,” SIAM Journal Optimization, vol. 30, no. 3, pp. 1795–1821,
2020.
[138]
G. Qu and N. Li, “Accelerated distributed Nesterov gradient descent,”
IEEE Transactions on Automatic Control, vol. 65, no. 6, pp. 2566–2581,
2020.
[139]
R. Xin, D. Jakoveti
´
c, and U. A. Khan, “Distributed Nesterov gradient
methods over arbitrary graphs,” IEEE Signal Processing Letters, vol. 26,
no. 8, pp. 1247–1251, 2019.
[140]
K. Scaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massouli
´
e, “Optimal
convergence rates for convex distributed optimization in networks,”
Journal of Machine Learning Research, vol. 20, no. 159, pp. 1–31,
2019.