paths. Accurate traffic demand and structural pattern from appli-
cations can allow SDN controller to split or re-route management
and data flows on different routes (as discussed in Hedera [6], Mi-
croTE [9]). Although it remains to be seen how effectively the
flow-level traffic engineering can optimize the performance of big
data applications, these schemes could be useful if optical switches
are not available in production data centers. Implementing flow
level traffic engineering requires installing a rule for each selected
flow on ToR switches, which imposes additional overhead to the
network configuration. In future work, we will explore flow level
traffic engineering mechanisms for big data applications and the
efficient implementation of them using SDN controller.
6. RELATED WORK
In addition to aforementioned work using optical switches to re-
configure the data center network topology, Schares et al. have
discussed the use of optical switches for stream processing sys-
tem [16]. Several recent studies explore the use of OpenFlow to
adjust the routing for different applications. In [12], Das et al. pro-
pose to use OpenFlow to aggregation traffic for different services
over dynamic links on converged packet-circuit network. This work
is focused on wide-area network services. Topology switching [27]
is a recent proposal to isolate applications and use different rout-
ing mechanisms for them in fat-tree based data centers. Our work
explores more tight integration between applications and network
control. We focus on the run-time network reconfiguration for big
data applications and the dynamic interaction between application
components and the SDN controller in data centers. We study the
programmability on every layer of network from physical topology
to routing and flow level traffic engineering.
7. CONCLUSION
In this paper, we explore an integrated network control architec-
ture to program the network at run-time for big data applications us-
ing optical circuits with an SDN controller. Using Hadoop as an ex-
ample, we discuss the integrated network control architecture, job
scheduling, topology and routing configuration for Hadoop jobs.
Our preliminary analysis suggests the great promise of integrated
network control for Hadoop with relatively small configuration over-
head. Although our discussion has been focused on Hadoop, the
integrated control architecture can be applied to any big data ap-
plications with a centralized or logically centralized master. Since
data aggregation is common in big data applications, the network
configuration for aggregation patterns can be generally applied to
other applications too. We believe our study serves as a step to-
wards tight and dynamic interaction between applications and net-
work using SDN.
8. REFERENCES
[1] Apache Hadoop, http://hadoop.apache.org.
[2] Apache HBase, http://hbase.apache.org.
[3] Floodlight openflow controller.
http://floodlight.openflowhub.org/.
[4] Open vswitch. http://openvswitch.org/.
[5] H. Abu-Libdeh, P. Costa, A. Rowstron, G. O’Shea, and
A. Donnelly. Symbiotic routing in future data centers. In
SIGCOMM’10, August 2010.
[6] M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and
A. Vahdat. Hedera: Dynamic flow scheduling for data center
networks. In USENIX NSDI’10, April 2010.
[7] G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica,
Y. Lu, B. Saha, and E. Harris. Reining in the outliers in
map-reduce clusters using mantri. In USENIX OSDI’10,
December 2010.
[8] T. Benson, A. Akella, A. Shaikh, and S. Sahu. Cloudnaas: A
cloud networking platform for enterprise applications. In
ACM SOCC’11, October 2011.
[9] T. Benson, A. Anand, A. Akella, and M. Zhang. Microte:
The case for fine-grained traffic engineering in data centers.
In ACM CoNEXT’11, December 2011.
[10] P. Chandra, A. Fisher, C. Kosak, T. S. E. Ng, P. Steenkiste,
E. Takahashi, and H. Zhang. Darwin: Resource management
for value-added customizable network service. In IEEE
ICNP’98, October 1998.
[11] P. Costa, A. Donnelly, A. Rowstron, and G. O’Shea.
Camdoop: Exploiting in-network aggregation for big data
applications. In USENIX NSDI’12, April 2012.
[12] S. Das, Y. Yiakoumis, G. Parulkar, P. Singh, D. Getachew,
P. D. Desai, and N. McKeown. Application-aware
aggregation and traffic engineering in a converged
packet-circuit network. In OFC’11, March 2011.
[13] B. Hindman et al. Mesos: A platform for fine-grained
resource sharing in the data center. In USENIX NSDI’11,
March 2011.
[14] H. Bazzaz et al. Switching the optial divide: Fundamental
challenges for hybrid electrical/optical data center networks.
In ACM SOCC’11, October 2011.
[15] K. Chen et al. OSA: An optical switching architecture for
data center networks with unprecedented flexibility. In
NSDI’12, April 2012.
[16] L. Schares et al. A reconfigurable interconnect fabric with
optical circuit switch and software optimizer for stream
computing systems. In OFC’09, March 2009.
[17] Y. Chen et al. Energy efficiency for large-scale mapreduce
workloads with significant interactive analysis. In ACM
EuroSys’12, April 2012.
[18] N. Farrington, G. Porter, S. Radhakrishnan, H. Bazzaz,
V. Subramanya, Y. Fainman, G. Papen, and A. Vahdat.
Helios: A hybrid electrical/optical switch architecture for
modular data centers. In ACM SIGCOMM, August 2010.
[19] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian,
Y. Zhang, and S. Lu. Bcube: A high performance,
server-centric network architecture for modular data centers.
In ACM SIGCOMM’09, August 2009.
[20] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad:
Distributed data-parallel programs from sequential building
blocks. In ACM EurySys’07, March 2007.
[21] S. Kavulya, J. Tan, R. Gandhi, and P. Narasimhan. An
analysis of traces from a production mapreduce cluster. In
CMU Technical Report, December 2009.
[22] M. Reitblatt, N. Foster, J. Rexford, C. Schlesinger, and
D. Walker. Abstractions for network update. In ACM
SIGCOMM’12, August 2012.
[23] J. Seedorf and E. Burger. Application-layer traffic
optimization (alto) problem statement. In RFC-5693, 2009.
[24] D. L. Tennenhouse, J. M. Smith, W. Sincoskie, D. Wetherall,
and G. Minden. A survey of active network research. In
IEEE Communications Magazine, January 1997.
[25] A. Vahdat, H. Liu, X. Zhao, and C. Johnson. The emerging
optical data center. In OFC’11, March 2011.
[26] G. Wang, D. Andersen, M. Kaminsky, K. Papagiannaki,
T. S. E. Ng, M. Kozuch, and M. Ryan. c-Through: Part-time
optics in data centers. In ACM SIGCOMM, August 2010.
[27] K. Webb, A. Snoeren, and K. Yocum. Topology switching for
data center networks. In USENIX Hot-ICE’11, March 2011.
[28] E. Weigle and W. Feng. A comparison of tcp automatic
tuning techniques for distributed computing. In IEEE
HPCS’02, July 2002.
[29] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and
I. Stoica. Spark: Cluster computing with working sets. In
USENIX HotCloud’10, June 2010.
[30] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and
I. Stoica. Improving mapreduce performance in
heterogeneous environments. In USENIX OSDI’08,
December 2008.