suggest two prominent clusters of relative selectivity values. The
first one typically ranges from 0.001 and above, and the second
one contains values that are smaller by multiple orders of mag-
nitude. This suggests a heuristic that “PathLazy" strategy could
be employed for queries with relative selectivity below 0.001, and
“SingleLazy" be employed for queries above 0.001.
8. CONCLUSION AND FUTURE WORK
We present a new subgraph isomorphism algorithm for dynamic
graph search. We developed a new data structure named Subgraph
Join Tree (SJ-Tree) that represents the execution strategy for a query.
We also developed a set of algorithms for efficient collection of
graph stream statistics, and using the statistics for automatic gener-
ation of the SJ-Tree for any given query graph. We also introduced
two selectivity metrics to quantitatively measure the hardness of a
query by considering the temporal properties of the graph. Given
a stable distribution of edge types and the skew in the distribution
of 2-edge subgraphs, we demonstrated that a query decomposition
based on more selective edges will be consistently efficient. We
went further to introduce a “Lazy" variant of the dynamic graph
search algorithm that exploits the varying selectivity between dif-
ferent parts of a query graph. We also extensively compared dif-
ferent search strategies, performing experiments on three different
datasets and multiple query classes. Finally, we concluded with a
Relative Selectivity based rule for selecting a search strategy.
The paper investigates the important problem of estimating se-
lectivity of query graphs. While our 2-edge subgraph based ap-
proach provides an initial foundation, deeper investigations are war-
ranted for more accurate selectivity estimation. Subsequent re-
search can leverage on the significant body of work on counting
larger subgraphs such as triangles in streaming or semi-streaming
scenarios. Exploring query graphs with more diverse structures
and developing a predictive model for accurate estimation of per-
formance needs to be addressed. Adaptive query processing is an
important follow-up problem as well. A long standing database
query needs to be robust against shift in the data characteristics.
While we propose a fast algorithm for periodic recomputation of
the primitive distribution, we do not address the issues of modeling
the inefficiency from operating under a different selectivity order
and migrating existing partial matches from one SJ-Tree to another.
Acknowledgment
Presented research is based on work funded under the Center for
Adaptive Supercomputing Software - Multithreaded Architecture
at the US Department of Energy’s Pacific Northwest National Lab-
oratory, which is operated by Battelle Memorial Institute.
9. REFERENCES
[1] A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito,
I. Nishizawa, J. Rosenstein, and J. Widom. Stream: The
stanford stream data manager. In SIGMOD ’03.
[2] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom.
Models and issues in data stream systems. PODS ’02.
[3] D. F. Barbieri, D. Braga, S. Ceri, E. D. Valle, and
M. Grossniklaus. C-sparql: a continuous query language for
rdf data streams. Int. J. Semantic Computing, 4(1), 2010.
[4] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J.
Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy,
S. R. Madden, F. Reiss, and M. A. Shah. Telegraphcq:
continuous dataflow processing. SIGMOD ’03.
[5] L. Chen and C. Wang. Continuous subgraph pattern search
over certain and uncertain graph streams. IEEE Trans. on
Knowl. and Data Eng., 22(8):1093–1109, Aug. 2010.
[6] S. Choudhury, L. Holder, G. Chin, and J. Feo. Fast search for
multi-relational graphs. ACM SIGMOD Workshop on
Dynamic Network Management and Mining, 2013.
[7] S. Choudhury, L. Holder, G. Chin, A. Ray, S. Beus, and
J. Feo. Streamworks: A system for dynamic graph search.
SIGMOD ’13.
[8] D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years
of graph matching in pattern recognition. Intl. Journal of
Pattern Recognition and Artificial Intelligence, 2004.
[9] L. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)
graph isomorphism algorithm for matching large graphs.
IEEE Trans. on Pattern Analysis and Machine Intelli., 2004.
[10] W. Fan, J. Li, J. Luo, Z. Tan, X. Wang, and Y. Wu.
Incremental graph pattern matching. SIGMOD ’11, 2011.
[11] W.-S. Han, J. Lee, and J.-H. Lee. Turboiso: Towards ultrafast
and robust subgraph isomorphism search in large graph
databases. SIGMOD ’13.
[12] H. He and A. K. Singh. Closure-tree: An index structure for
graph queries. ICDE ’06.
[13] J. M. Hellerstein and M. Stonebraker. Predicate migration:
Optimizing queries with expensive predicates, volume 22.
ACM, 1993.
[14] M. Jha, C. Seshadhri, and A. Pinar. A space efficient
streaming algorithm for triangle counting using the birthday
paradox. In SIGKDD. ACM, 2013.
[15] C. Joslyn, S. Choudhury, D. Haglin, B. Howe, B. Nickless,
and B. Olsen. Massive scale cyber traffic analysis: a driver
for graph database research. In 1st ACM SIGMOD Workshop
on Graph Data Management Experiences and Systems, 2013.
[16] A. Khan, N. Li, X. Yan, Z. Guan, S. Chakraborty, and S. Tao.
Neighborhood based fast graph search in large networks.
SIGMOD ’11.
[17] R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of
nonrecursive queries. In VLDB, volume 86, pages 128–137.
Citeseer, 1986.
[18] Y.-N. Law, H. Wang, and C. Zaniolo. Relational languages
and data models for continuous queries on sequences and
data streams. ACM Trans. Database Syst., June 2011.
[19] J. Mondal and A. Deshpande. Eagr: Supporting continuous
ego-centric aggregate queries over large dynamic graphs. In
SIGMOD 2014.
[20] C. Seshadhri, A. Pinar, and T. G. Kolda. Fast triangle
counting through wedge sampling. In Proceedings of the
SIAM Conference on Data Mining, 2013.
[21] Z. Sun, H. Wang, H. Wang, B. Shao, and J. Li. Efficient
subgraph matching on billion node graphs. PVLDB, 5(9),
2012.
[22] Y. Tian and J. Patel. Tale: A tool for approximate large graph
matching. In ICDE ’08.
[23] H. Tong, C. Faloutsos, B. Gallagher, and T. Eliassi-Rad. Fast
best-effort pattern matching in large attributed graphs. KDD
’07.
[24] J. R. Ullmann. An algorithm for subgraph isomorphism. J.
ACM, 23:31–42, January 1976.
[25] Y. Wu, J. M. Patel, and H. Jagadish. Structural join order
selection for xml query optimization. In Data Engineering,
2003. Proceedings. 19th International Conference on, pages
443–454. IEEE, 2003.
[26] P. Zhao, C. C. Aggarwal, and M. Wang. gsketch: on query
estimation in graph streams. PVLDB, 5(3), 2011.