In this paper, we define the problem of mining high utility sequential patterns (HUSPs) over high-velocity streaming data and propose an efficient algorithm for mining HUSPs over a data stream. The main challenges we tackle include how to maintain a compact summary of the data stream to reflect the evolution of sequence utilities over time and how to overcome the problem of combinatorial explosion of a search space. We propose a compact data structure named HUSP-Tree to maintain the essential information for mining HUSPs in an online fashion. An efficient and single-pass algorithm named HUSP-Stream is proposed to generate HUSPs from HUSP-Tree. HUSP-Stream uses a new utility estimation model to more effectively prune the search space. Experimental results on real and synthetic datasets show that our algorithm serves as an efficient solution to the new problem of mining high utility sequential patterns over data streams.
This work addresses the challenges of mining sequential patterns from evolving data streams.