Mining high utility sequential patterns from evolving data streams

Abstract

In this paper, we define the problem of mining high utility sequential patterns (HUSPs) over high-velocity streaming data and propose an efficient algorithm for mining HUSPs over a data stream. The main challenges we tackle include how to maintain a compact summary of the data stream to reflect the evolution of sequence utilities over time and how to overcome the problem of combinatorial explosion of a search space. We propose a compact data structure named HUSP-Tree to maintain the essential information for mining HUSPs in an online fashion. An efficient and single-pass algorithm named HUSP-Stream is proposed to generate HUSPs from HUSP-Tree. HUSP-Stream uses a new utility estimation model to more effectively prune the search space. Experimental results on real and synthetic datasets show that our algorithm serves as an efficient solution to the new problem of mining high utility sequential patterns over data streams.

Publication
Proceedings of the ASE BigData & SocialInformatics 2015

This work addresses the challenges of mining sequential patterns from evolving data streams.

Morteza Zihayat
Morteza Zihayat
Principal Investigator

Dr. Morteza Zihayat is a Canada Research Chair (CRC) in Human-Centered AI and Associate Professor at Toronto Metropolitan University, Faculty of Engineering and Architectural Science. He also holds appointments as Adjunct Associate Professor at the University of Waterloo (Management Sciences) and IBM Faculty Fellow at IBM Centre for Advanced Studies. He is the Director of the Human-Centered Machine Intelligence Lab.