Acknowledgements
We thank Chris Rossbach, Yuan Yu, and Jon Currey for
their help with the Dandelion compiler. We thank our
anonymous reviewers, Chuck Thacker, Doug Burger, Babak
Falsafi, and Onur Ko¸cberber for their invaluable feedback.
11. REFERENCES
[1] “Big Data Definition,”
mike2.openmethodology.org/wiki/Big Data Definition.
[2] “Mono Platform,” www.mono-project.com.
[3] P. Bakkum and K. Skadron, “Accelerating SQL Database
Operations on a GPU with CUDA,” in GPGPU’10.
[4] J. Benson, R. Cofell, C. Frericks, C.-H. Ho, V. Govindaraju,
T. Nowatzki, and K. Sankaralingam, “Design, Integration and
Implementation of the DySER Hardware Accelerator into
OpenSPARC,” in HPCA’12.
[5] K. Brown, A. Sujeeth, H. J. Lee, T. Rompf, H. Chafi,
M. Odersky, and K. Olukotun, “A Heterogeneous Parallel
Framework for Domain-Specific Languages,” in PACT’11.
[6] M. Budiu, J. Shotton, D. G. Murray, and M. Finocchio,
“Parallelizing the Training of the Kinect Body Parts Labeling
Algorithm,” in Big Learning: Algorithms, Systems and Tools
for Learning at Scale, Sierra Nevada, Spain, December 16-17
2011.
[7] B. Catanzaro, M. Garland, and K. Keutzer, “Copperhead:
Compiling an Embedded Data Parallel Language,” in
PPoPP’11.
[8] Chipworks, Inc. Inside the Apple iPad 4–A6X a very new beast!
www.chipworks.com/blog/recentteardowns/2012/11/01/
inside-the-apple-ipad-4-a6x-to-be-revealed/.
[9] E. S. Chung, P. A. Milder, J. C. Hoe, and K. Mai, “Single-Chip
Heterogeneous Computing: Does the Future Include Custom
Logic, FPGAs, and GPGPUs?” in MICRO’10.
[10] R. H. Dennard, F. H. Gaensslen, H.-n. Yu, V. Leo Rideovt,
E. Bassous, and A. R. Leblanc, “Design of Ion-Implanted
MOSFET’s with Very Small Physical Dimensions,” Solid-State
Circuits Newsletter, IEEE, vol. 12, no. 1, pp. 38 –50, winter
2007.
[11] D. J. DeWitt, “DIRECT - A Multiprocessor Organization for
Supporting Relational Database Management Systems,” in
ISCA’78.
[12] D. J. DeWitt and R. H. Gerber, “Multiprocessor Hash-Based
Join Algorithms,” in VLDB’85.
[13] D. J. Dewitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B.
Kumar, and M. Muralikrishna, “Gamma - A High Performance
Dataflow Database Machine,” in VLDB’86.
[14] H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam,
and D. Burger, “Dark Silicon and the End of Multicore
Scaling,” in ISCA’11.
[15] Gartner, “The Mobile Scenario: Understanding Mobile Trends
Through 2017,” gartner.com/it/page.jsp?id=2227215, Nov 2012.
[16] J. R. Goodman, “An Investigation of Multiprocessor Structures
and Algorithms for Database Management,” May 1981.
[17] N. K. Govindaraju and D. Manocha, “Efficient Relational
Database Management Using Graphics Processors,” in
DaMoN’05.
[18] N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki,
“Toward Dark Silicon in Servers,” IEEE Micro, vol. 31, no. 4,
pp. 6–15, Jul. 2011.
[19] B. He, M. Lu, K. Yang, R. Fang, N. K. Govindaraju, Q. Luo,
and P. V. Sander, “Relational Query Coprocessing on Graphics
Processors,” ACM Trans. Database Syst., vol. 34, no. 4, Dec.
2009.
[20] B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and
P. Sander, “Relational Joins on Graphics Processors,” in
SIGMOD’08.
[21] Herb Sutter, “Elements of Modern C++ Style,”
herbsutter.com/elements-of-modern-c-style, Oct 2010.
[22] IBM, Inc. The Netezza Data Appliance Architecture: A
Platform for High Performance Data Warehousing and
Analytics.
[23] Intel, Inc. Intel Math Kernel Library.
http://www.intel.com/software/products/mkl.
[24] M. Isard et al., “Dryad: Distributed Data-Parallel Programs
from Sequential Building Blocks,” in Proc. EuroSys, 2007.
[25] T. Kaldewey, G. Lohman, R. Mueller, and P. Volk, “GPU Join
Processing Revisited,” in DaMoN’12.
[26] S. Kamil, D. Coetzee, S. Beamer, H. Cook, E. Gonina,
J. Harper, J. Morlan, and A. Fox, “Portable Parallel
Performance from Sequential, Productive, Embedded
Domain-Specific Languages,” SIGPLAN Not., vol. 47, no. 8,
pp. 303–304, Feb. 2012.
[27] C. Kim, T. Kaldewey, V. W. Lee, E. Sedlar, A. D. Nguyen,
N. Satish, J. Chhugani, A. Di Blas, and P. Dubey, “Sort vs.
Hash Revisited: Fast Join Implementation on Modern
Multi-core CPUs,” Proc. VLDB Endow.
[28] I. Kuon and J. Rose, “Measuring the Gap Between FPGAs and
ASICs,” in FPGA’06.
[29] I. Lebedev, C. Fletcher, S. Cheng, J. Martin, A. Doupnik,
D. Burke, M. Lin, and J. Wawrzynek, “Exploring Many-core
Design Templates for FPGAs and ASICs,” Int. J. Reconfig.
Comput., vol. 2012, pp. 8:8–8:8, Jan. 2012. [Online]. Available:
http://dx.doi.org/10.1155/2012/439141
[30] S. Manegold, P. Boncz, and M. Kersten, “Optimizing
Main-Memory Join on Modern Hardware,” IEEE Trans. on
Knowl. and Data Eng., vol. 14, no. 4, pp. 709–730, Jul. 2002.
[31] F. McSherry, Y. Yu, M. Budiu, M. Isard, and D. Fetterly,
Scaling Up Machine Learning. Cambridge University Press,
2011, ch. Large-Scale Machine Learning using DryadLINQ.
[32] Microsoft, Inc., “LINQ (Language-Integrated Query),”
msdn.microsoft.com/en-us/library/bb397926.aspx.
[33] R. Mueller, J. Teubner, and G. Alonso, “Glacier: A
Query-to-Hardware Compiler,” in SIGMOD’10.
[34] ——, “Streams on Wires: A Query Compiler for FPGAs,” Proc.
VLDB Endow., vol. 2, no. 1, pp. 229–240, Aug. 2009.
[35] NVIDIA, Inc. www.nvidia.com.
[36] ——. www.nvidia.com/object/cuda home new.html.
[37] Y. Oge, T. Miyoshi, H. Kawashima, and T. Yoshinaga, “An
Implementation of Handshake Join on FPGA,” in ICNC’11.
[38] C. J. Rossbach, Y. Yu, J. Currey, and J.-P. Martin, “Dandelion:
A Compiler and Runtime for Distibuted Heterogeneous
Systems,” Technical Report: MSR-TR-2013-44, Microsoft
Research Silicon Valley, 2013.
[39] Samsung, Inc. www.samsung.com/global/business/
semiconductor/minisite/Exynos/index.html.
[40] Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang,
“FPMR: MapReduce framework on FPGA,” in FPGA’10.
[41] A. Shatdal, C. Kant, and J. F. Naughton, “Cache Conscious
Algorithms for Relational Query Processing,” in VLDB’94.
[42] G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin,
J. Lugo-Martinez, S. Swanson, and M. B. Taylor, “Conservation
Cores: Reducing the Energy of Mature Computations,” in
ASPLOS’10.
[43] Victor Podlozhnyuk, NVIDIA Inc. (2007) Black-Scholes Option
Pricing.
[44] Wei-keng Liao. [Online]. Available:
users.eecs.northwestern.edu/˜wkliao/Kmeans
[45] S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and
J. Demmel, “Optimization of Sparse Matrix-Vector
Multiplication on Emerging Multicore Platforms,” in SC’07.
[46] H. Wu, G. Diamos, S. Cadambi, and S. Yalamanchili, “Kernel
Weaver: Automatically Fusing Database Primitives for Efficient
GPU Computation,” in MICRO’12.
[47] Xilinx, Inc. Vivado High-Level Synthesis.
www.xilinx.com/tools/autoesl.htm.
[48] ——, “ZC702 Evaluation Board for the Zynq-7000 XC7Z020.
All Programmable SoC User Guide, October 8, 2012.”
[49] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K.
Gunda, J. Currey, F. McSherry, and K. Achan, “Some Sample
Programs Written in DryadLINQ,” Microsoft Research, Tech.
Rep. MSR-TR-2009-182, December 2009.