AIDE_poster.ppt

Experiments

In the following, compare three algorithms

• COCOA ([2]) with SDCA locally

• InexactDANE (DANE) with SVRG locally

• AIDE with SVRG locally

By default for a single pass through local data.

Algorithm comparison

• Rcv1 dataset, smoothed hinge loss

• Regularization strength { }

• Data randomly distributed across 8 nodes.

Node scaling

• Rcv1, covtype, realism, url datasets, logistic loss

• Randomly distributed across 4–64 computers.

• Fixed number of local steps of SVRG

Arbitrary data partitioning

• Rcv1, covtype, realsim datasets, logistic loss

• Partitioned to 2 computers:

Ø  Randomly (random)

Ø  Based on output label (output)

AIDE: Fast and Communication Efficient Distributed Optimization

Sashank J. Reddi*, Jakub Konečný^, Peter Richtárik^, Barnabás Póczos*, Alex Smola*

*Machine Learning Department, Carnegie Mellon University

^School of Mathematics, University of Edinburgh

Distributed Optimization Problem

Minimize an average of functions, each of which is

stored on different machine. Formally,

where is the number of machines, and

Here, usually represent a loss function incurred on

the i-th data point. Set denotes indices of data points

stored on computer .

Baseline algorithm: DANE [3]

At iteration , with iterate , each machine solves

where

This is followed by aggregation to form new iterate

Properties

•  Fast convergence when are similar enough

(i.i.d. data distribution)

•  Not robust to arbitrary data distributions

•  Exact minimum computationally infeasible in

many applications

Our contributions

•  Inexact version of DANE – more robust

•  Accelerated version, AIDE, that (nearly) matches

communication complexity lower bounds

Ø  First efficient method that can be

implemented using only first-order oracle

Two Distributed Optimization Algorithms

Inexact DANE

Core idea – solve the DANE subproblem approximately.

References

[1] Hongzhou Lin, Julien Mairal, and Zaid Harchaoui. “A universal catalyst for first-

order optimization.” In Advances in Neural Information Processing Systems, 2015.

[2] Chenxin Ma, Jakub Konečný, Martin Jaggi, Virginia Smith, Michael I. Jordan, Peter

Richtárik, and Martin Takáč. “Distributed Optimization with Arbitrary Local Solvers.”

arXiv preprint arXiv:1512.04039 (2015).

[3] Ohad Shamir, Nathan Srebro, and Tong Zhang. “Communication efficient

distributed optimization using an approximate newton-type method.” arXiv preprint

arXiv:1312.7853 (2013).

Communication complexity guarantees

– strong convexity; – smoothness parameter; – target accuracy

• Only modest accuracy necessary for the above rates ( )

• Inexactness changes only constants and adds practical robustness

AIDE: Accelerated Inexact DANE

Core idea – apply Universal Catalyst [1] to InexactDANE