Example: air traffic controller

Anomaly Detection : A Survey

A modified version of this technical report will appear in ACM Computing Surveys, September Detection : A SurveyVARUN CHANDOLAU niversity of MinnesotaARINDAM BANERJEEU niversity of MinnesotaandVIPIN KUMARU niversity of MinnesotaAnomaly Detection is an important problem that has been researched within diverse research areasand application domains. Many Anomaly Detection techniques have been specifically developedfor certain application domains, while others are more generic. This Survey tries to provide astructured and comprehensive overview of the research on Anomaly Detection . We have groupedexisting techniques into different categories based on the underlying approach adopted by eachtechnique.

detection techniques developed in machine learning and statistical domains. A broad review of anomaly detection techniques for numeric as well as symbolic data is presented by Agyemang et al. [2006]. An extensive review of novelty detection techniques using neural networks and …

Tags:

  Using, Detection, Anomaly, Anomaly detection

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Anomaly Detection : A Survey

1 A modified version of this technical report will appear in ACM Computing Surveys, September Detection : A SurveyVARUN CHANDOLAU niversity of MinnesotaARINDAM BANERJEEU niversity of MinnesotaandVIPIN KUMARU niversity of MinnesotaAnomaly Detection is an important problem that has been researched within diverse research areasand application domains. Many Anomaly Detection techniques have been specifically developedfor certain application domains, while others are more generic. This Survey tries to provide astructured and comprehensive overview of the research on Anomaly Detection . We have groupedexisting techniques into different categories based on the underlying approach adopted by eachtechnique.

2 For each category we have identified key assumptions, which are used by the techniquesto differentiate between normal and anomalous behavior. When applying a given technique to aparticular domain, these assumptions can be used as guidelines to assess the effectiveness of thetechnique in that domain. For each category, we provide a basic Anomaly Detection technique, andthen show how the different existing techniques in that category are variants of the basic tech-nique. This template provides an easier and succinct understanding of the techniques belongingto each category. Further, for each category, we identify the advantages and disadvantages of thetechniques in that category.

3 We also provide a discussion on the computational complexity of thetechniques since it is an important issue in real application domains. We hope that this surveywill provide a better understanding of the different directions in which research has been done onthis topic, and how techniques developed in one area can be applied in domains for which theywere not intended to begin and Subject Descriptors: [Database Management]: Database Applications Data MiningGeneral Terms: AlgorithmsAdditional Key Words and Phrases: Anomaly Detection , Outlier Detection1. INTRODUCTIONA nomaly detectionrefers to the problem of finding patterns in data that do notconform to expected behavior.

4 These non-conforming patterns are often referred toas anomalies, outliers, discordant observations, exceptions, aberrations, surprises,peculiarities or contaminants in different application domains. Of these, anomaliesand outliers are two terms used most commonly in the context of Anomaly Detection ;sometimes interchangeably. Anomaly Detection finds extensive use in a wide varietyof applications such as fraud Detection for credit cards, insurance or health care,intrusion Detection for cyber-security, fault Detection in safety critical systems, andmilitary surveillance for enemy importance of Anomaly Detection is due to the fact that anomalies in datatranslate to significant (and often critical) actionable information in a wide varietyof application domains.

5 For example, an anomalous traffic pattern in a computerTo Appear in ACM Computing Surveys, 09 2009, Pages 1 Chandola, Banerjee and Kumarnetwork could mean that a hacked computer is sending out sensitive data to anunauthorized destination [Kumar 2005]. An anomalous MRI image may indicatepresence of malignant tumors [Spence et al. 2001]. Anomalies in credit card trans-action data could indicate credit card or identity theft [Aleskerov et al. 1997] oranomalous readings from a space craft sensor could signify a fault in some compo-nent of the space craft [Fujimaki et al. 2005].Detecting outliers or anomalies in data has been studied in the statistics commu-nity as early as the 19thcentury [Edgeworth 1887].

6 Over time, a variety of anomalydetection techniques have been developed in several research communities. Many ofthese techniques have been specifically developed for certain application domains,while others are more Survey tries to provide a structured and comprehensive overview of theresearch on Anomaly Detection . We hope that it facilitates a better understandingof the different directions in which research has been done on this topic, and howtechniques developed in one area can be applied in domains for which they werenot intended to begin What are anomalies?Anomalies are patterns in data that do not conform to a well defined notion ofnormal behavior.

7 Figure 1 illustrates anomalies in a simple 2-dimensional data data has two normal regions,N1andN2, since most observations lie in thesetwo regions. Points that are sufficiently far away from the regions, , pointso1ando2, and points in regionO3, are 1. A simple example of anomalies in a 2-dimensional data might be induced in the data for a variety of reasons, such as maliciousactivity, , credit card fraud, cyber-intrusion, terrorist activity or breakdown of asystem, but all of the reasons have a common characteristic that they areinterestingto the analyst. The interestingness or real life relevance of anomalies is a keyfeature of Anomaly Detection is related to, but distinct fromnoise removal[Teng et ] andnoise accommodation[Rousseeuw and Leroy 1987], both of which dealTo Appear in ACM Computing Surveys, 09 Detection : A Survey 3with unwantednoisein the data.

8 Noise can be defined as a phenomenon in datawhich is not of interest to the analyst, but acts as a hindrance to data removal is driven by the need to remove the unwanted objects before anydata analysis is performed on the data. Noise accommodation refers to immunizinga statistical model estimation against anomalous observations [Huber 1974].Another topic related to Anomaly Detection isnovelty Detection [Markou andSingh 2003a; 2003b; Saunders and Gero 2000] which aims at detecting previouslyunobserved (emergent, novel) patterns in the data, , a new topic of discussionin a news group. The distinction between novel patterns and anomalies is thatthe novel patterns are typically incorporated into the normal model after should be noted that solutions for above mentioned related problems are oftenused for Anomaly Detection and vice-versa, and hence are discussed in this reviewas ChallengesAt an abstract level, an Anomaly is defined as a pattern that does not conform toexpected normal behavior.

9 A straightforward Anomaly Detection approach, there-fore, is to define a region representing normal behavior and declare any observationin the data which does not belong to this normal region as an Anomaly . But severalfactors make this apparently simple approach very challenging: Defining a normal region which encompasses every possible normal behavior isvery difficult. In addition, the boundary between normal and anomalous behavioris often not precise. Thus an anomalous observation which lies close to theboundary can actually be normal, and vice-versa. When anomalies are the result of malicious actions, the malicious adversariesoften adapt themselves to make the anomalous observations appear like normal,thereby making the task of defining normal behavior more difficult.

10 In many domains normal behavior keeps evolving and a current notion of normalbehavior might not be sufficiently representative in the future. The exact notion of an Anomaly is different for different application domains. Forexample, in the medical domain a small deviation from normal ( , fluctuationsin body temperature) might be an Anomaly , while similar deviation in the stockmarket domain ( , fluctuations in the value of a stock) might be considered asnormal. Thus applying a technique developed in one domain to another is notstraightforward. Availability of labeled data for training/validation of models used by anomalydetection techniques is usually a major issue.


Related search queries