### Transcription of Event Detection from Time Series Data - Data Analysis and ...

1 **Event** **Detection** from **time** **Series** **data** Valery Guralnik, Jaideep Srivastava Department of Computer Science University of Minnesota { guralnik , Abstract whether the sampling is fixed or adaptive. The In the past few years there has been increased interest in overall goal is to obtain an accurate picture of the using **data** -mining techniques to extract interesting patterns phenomenon with minimum sampling effort. Examples from **time** **Series** **data** generated by sensors monitoring of such observations include highway traffic monitoring, temporally varying phenomenon. Most work has assumed electro-cardiograms, and monitoring of oil refineries. that raw **data** is somehow processed to generate a sequence In the past few years there has been increased interest of **events** , which is then mined for interesting episodes. In in using **data** mining techniques to extract interesting some cases the rule for determining when a sensor reading patterns from temporal sequences [SA95, MTV97, should generate an **Event** is well known.]}

2 However, if the PT96]. A standard assumption has been that the raw phenomenon is ill-understood, stating such a rule is difficult. **data** collected from sensors is somehow processed to **Detection** of **events** in such an environment is the focus generate a sequence of **events** , which is then mined for of this paper. Consider a dynamic phenomenon whose behavior changes enough over **time** to be considered a interesting episodes [MTV95, HKM+96]. While there qualitatively significant change. The problem we investigate is no strict definition of an episode, intuitively it is a is of identifying the **time** points at which the behavior pattern of **events** occurring in some order, and close change occurs. In the statistics literature this has been enough to each other in **time** . Recent research has called the change-point **Detection** problem. The standard developed languages for specifying temporal patterns approach has been to (a) upriori determine the number [MT96, PT96, GWS98], and algorithms have been of change-points that are to be discovered, and (b) decide proposed that takes advantage of the specified pattern the function that will be used for curve fitting in the to increase the computational efficiency of the mining interval between successive change-points.

3 In this paper process. we generalize along both these dimensions. We propose an However, an issue that has received scant attention iterative algorithm that fits a model to a **time** segment, and uses a likelihood criterion to determine if the segment should is of deriving an **Event** sequence from raw sensor **data** . be partitioned further, if it contains a new change- In some cases the rule for determining when a sensor point. In this paper we present algorithms for both the batch reading should generate an **Event** is well known, and incremental versions of the problem, and evaluate their if the temperature of a boiler goes above a certain behavior with synthetic and real **data** . Finally, we present threshold, then sound an alarm. However, if the initial results comparing the change-points detected by the phenomenon is ill-understood or changes its behavior batch algorithm with those detected by people using visual unpredictably, adapting the threshold such that **Event** inspection. reporting is accurate becomes very difficult.

4 Thus, a more systematic approach is required for processing the 1 Introduction raw sensor **data** to generate an **Event** sequence. This is the focus of our paper. Sensor-based monitoring of any phenomenon creates **time** **Series** **data** . The spacing between successive Consider a dynamic phenomenon whose behavior readings may be constant or varying, depending on changes enough over **time** so as to be considered a qualitatively significant change. Each such change is Permission to make digital or hard topics of all or part of this work ~CII an **Event** . An example is the change of highway traffic personal or classroom use is granted without fee provided that copies from light to heavy to congested. Another example is are not made or distributed lb profit or commercial advantage and that the change of a boiler from normal to super-heated. copies bear this notice and the full citation on the tirst page. To copy The specific problem we address is of applying **data** otherwise, to republish, to post on scrvcrs or to redistribute to lists.

5 Requires prior specific permission and/or a fee. mining techniques to identify the **time** points at which KDD-99 San Diego CA USA the changes, **events** , occur. In the statistics Copyright ACM 1999 l-58113-143-7/99 $ literature this has been called the change point **Detection** 33. problem. The standard approach has been to (a) of parameters in the model, or perhaps even the change apriori determine the number of change points to of the model itself, at unknown **time** (s). be discovered, and (b) decide the model to be used This problem is widely known as the change-point for fitting the subsequence between successive change **Detection** problem in the field of statistics. A number points. Thus, the problem becomes one of finding the of approaches have been proposed to solve the change- best set of the predetermined number of points that point. **Detection** problem [SO94, Hus93, Haw76, HM73, minimizes the error in fitting the pre-decided function Gut74]. The standard assumption is that the phe- [SO94, Hus93, Haw76, HM73, Gut74].

6 [KS971 addresses nomenon can be approximated by a known, stationary the problem of approximating a sequence of sensor (usually linear) model. However, this assumption may readings by a set of Ic linear segments as a pre-processing not be true in some domains, creating the need for an step. This too can be considered a version of the change- approach that works without this assumption. In this point **Detection** problem. In the proposed approach, we paper we propose an approach that simultaneously ad- address both limitations of standard approaches. First, dresses the issue of model selection and change-point we place no constraint on the class of functions that **Detection** . will be fitted to the subsequences between successive change points. Second, the number of change points is Formal Statement of the Problem not, fixed apriori. Rather, the appropriate set is found Consider a **time** **Series** denoted by y(t), t = 1,2, ..n. by using maximum likelihood methods [Hud66]. where t is a **time** variable.]

7 In this paper we study two versions of the change We would like to find a piecewise segmented model point **Detection** problem, namely the batch and the M, given by incremental versions. In the batch version the entire Y = fl(t,wl) + cl(t), (1 < t I h), **data** set is available, as in the case of 24-hour **data** from = f2(t,w2) + e2(t), (6 < t 5 f32), traffic sensors, from which the best, set of change points (t,.Wlc) +ek(tj, .ie,l;.(t' ~j;~. can be determined. In the incremental version, the algorithm receives new **data** points one at a **time** , and determines if the new observation causes a new change- An fi(t, wi) is the function (with its vector of point to be discovered. Our contributions include parameters wi) that is fit in segment i. The Bi's are the change points between successive segments, and ei(t) s l developing a general approach to change-point, are error terms. At this point we put no constraints on **Event** , **Detection** that generalizes previous the nature of fi(t, ~0's. approaches, Maximum Likelihood Estimation l developing algorithms for both the batch and incre- If all change points are specified a priori, and mod- mental versions of the change point **Detection** prob- els with parameters wi's and estimated standard devi- lem, ations gi's found for each segment, then the statistical l evaluating their behavior with synthetic and real likelihood L, of the change points is proportional to **data** , fi (q-m; - heteroscedastic error and comparing the algorithms with visual change- i=l l L= --n/2.))))

8 Point **Detection** by humans. - homoscedastic error i$ -2. This paper is organized as follows: In section 2 i [ I. we formally describe the **Event** **Detection** problem. Here Ic is the number of change-points, mi is the number Section 3 presents the batch algorithm and Section 4 of **time** points in segment i, and n is the total number its performance. Section 5 describes the incremental of **time** pointsl. algorithm, which is evaluated in the Section 6. Section If the change points are not known, the maximum 7 concludes the paper. likelihood estimate (MLE) of the ei's can be found by maximizing the likelihood L over all possible sets of Bi's, 2 **Event** **Detection** as Change-Point or equivalently, by minimizing -2 log L. This function is equivalent to, **Detection** In this paper we are interested in real-valued **time** 5 rni log 0: - heteroscedastic error **Series** denoted by y(t), t = 1,2, .. n, where t is a -210gL = i=l **time** variable. It is assumed that the **time** **Series** 7l lOg( 5 ? ) - homoscedastic error can be modeled mathematically, where each model is 1 i=i characterized by a set of parameters.]

9 The problem of The homoscedastic error model specifies that 01 = (~2= .. =. **Event** **Detection** becomes one of recognizing the change ok. Heteroscedastic error model doesn't impose this constraint. 34. In this paper, the term likelihood criteria will refer to 3 Batch Algorithm function -2 logL, and will be denoted as C. Because In this section we assume that the entire **data** set log is a monotonically increasing function, for the ho- is collected before the **Analysis** begins. In section 5. moscedastic error case we use the equivalent likelihood we consider the incremental case where change-point criteria of minimizing the function C, =, rrzi~p. **Detection** proceeds concurrently with **data** collection. Model Selection Algorithm Description Change-point **Detection** algorithms have been studied in For each segment i, model estimation is the problem the statistics literature [Haw76, HM73, Gut74]. They of finding the function fi(t, wi) that best approximates have worked under the assumptions that the **data** .)

10 The quality of an approximation produced by the learning system is measured by the loss function (a) a stationary known model can be used to describe Loss(y(t),fi(t,wi)), where Bi-i < t 5 0i. The the phenomenon, and expected value of loss is called risk functional &(wi) =. E[Loss(y(t), ji(t, wi)]. Therefore, for each segment the (b) the number of change points is known apriori. learning system has to find a fi(t, wi) that minimizes Our approach was to start from the algorithm described R(wi). in [Haw76], and remove these assumptions. Let us now consider the nature of fi(t, wi) s. Most Assume that the best model that maintains **time** past work has assumed that the nature of these points ti, ti+l, .. tj as a single segment has been functions is known, or can be somehow determined selected. Let S be the residual sum of squares for from domain knowledge. However, in general this this model. The number of points in this segment cannot be done, and thus our approach allows the is m = j - i + 1.)