Pinnacle SMU LARC

Frequent Pattern Mining.

1. Mining Top-K Large Structural Patterns in a Massive Network.

Overview

With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose SpiderMine, a novel algorithm to efficiently mine top-K largest frequent patterns from a single massive network with any user-specified probability of 1−ϵ. Deviating from the existing edge-by-edge (i.e., incremental) pattern-growth framework, SpiderMine achieves its efficiency by unleashing the power of small patterns of a bounded diameter, which we call “spiders”. With the spider structure, our approach adopts a probabilistic mining framework to find the top-k largest patterns by (i) identifying an affordable set of promising growth paths toward large patterns, (ii) generating large patterns with much lower combinatorial complexity, and finally (iii) greatly reducing the cost of graph isomorphism tests with a new graph pattern representation by a multi-set of spiders. Extensive experimental studies on both synthetic and real data sets show that our algorithm outperforms existing methods.



Members

Qiang QU, Feida ZHU

Dataset

DBLP

Publication

"Mining Top-K Large Structural Patterns in a Massive Network"
Feida Zhu, Qiang Qu, David Lo, Xifeng Yan, Jiawei Han, Philip S. Yu Proc. of the VLDB Endowment PVLDB 4(11): 807-818, 2011


2. A Direct Mining Approach To Efficient Constrained Graph Pattern Discovery.

Overview

Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitly-specified constraints cannot be handled in a way that is direct and precise. In this paper, we propose a direct mining framework to solve the problem and illustrate our ideas in the context of a particular type of constrained frequent patterns — the “skinny” patterns, which are graph patterns with a long backbone from which short twigs branch out. These patterns, which we formally define as l-long δ-skinny patterns, are able to reveal insightful spatial and temporal trajectory patterns in mobile data mining, information diffusion, adoption propagation, and many others. Based on the key concept of a canonical diameter, we develop SkinnyMine, an efficient algorithm to mine all the l-long δ-skinny patterns guaranteeing both the completeness of our mining result as well as the unique generation of each target pattern. We also present a general direct mining framework together with two properties of reducibility and continuity for qualified constraints. Our experiments on both synthetic and real data demonstrate the effectiveness and scalability of our approach.





Members

Zequn ZHANG, Feida ZHU

Dataset

DBLP

Publication

"A Direct Mining Approach To Efficient Constrained Graph Pattern Discovery"
Feida Zhu, Zequn Zhang, Qiang Qu, 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD'13), New York, USA, June, 2013.