LOF: Identifying Density-Based Local Outliers

1 LOF: Identifying Density-Based Local OutliersMarkus M. Breunig , Hans-Peter Kriegel , Raymond T. Ng , J rg Sander Institute for Computer ScienceDepartment of Computer ScienceUniversity of MunichUniversity of British ColumbiaOettingenstr. 67, D-80538 Munich, GermanyVancouver, BC V6T 1Z4 Canada{ breunig | kriegel | sander many KDD applications, such as detecting criminal activities inE-commerce, finding the rare instances or the Outliers , can be moreinteresting than finding the common patterns. Existing work in out-lier detection regards being an outlier as a binary property.}

In thispaper, we contend that for many scenarios, it is more meaningful toassign to each object a degree of being an outlier. This degree iscalled the Local outlier factor (LOF) of an object. It is Local in thatthe degree depends on how isolated the object is with respect to thesurrounding neighborhood. We give a detailed formal analysisshowing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outlierswhich appear to be meaningful, but can otherwise not be identifiedwith existing approaches.

Finally, a careful performance evaluationof our algorithm confirms we show that our approach of finding lo-cal Outliers can be Detection, Database INTRODUCTIONL arger and larger amounts of data are collected and stored in data-bases, increasing the need for efficient and effective analysis meth-ods to make use of the information contained implicitly in the discovery in databases (KDD) has been defined as thenon-trivial process of Identifying valid, novel, potentially useful,and ultimately understandable knowledge from the data [9]. Most studies in KDD focus on finding patterns applicable to a con-siderable portion of objects in a dataset.

However, for applicationssuch as detecting criminal activities of various kinds ( in elec-tronic commerce), rare events, deviations from the majority, or ex-ceptional cases may be more interesting and useful than the com-mon cases. Finding such exceptions and Outliers , however, has notyet received as much attention in the KDD community as some oth-er topics have, association rules. Recently, a few studies have been conducted on outlier detectionfor large datasets ( [18], [1], [13], [14]). While a more detaileddiscussion on these studies will be given in section 2, it suffices topoint out here that most of these studies consider being an outlier asa binary property.

That is, either an object in the dataset is an outlieror not. For many applications, the situation is more complex. Andit becomes more meaningful to assign to each object a degree of be-ing an outlier. Also related to outlier detection is an extensive body of work onclustering algorithms. From the viewpoint of a clustering algo-rithm, Outliers are objects not located in clusters of a dataset, usual-ly called noise. The set of noise produced by a clustering algorithm,however, is highly dependent on the particular algorithm and on itsclustering parameters.

Only a few approaches are directly con-cerned with outlier detection. These algorithms, in general, consid-er Outliers from a more global perspective, which also has some ma-jor drawbacks. These drawbacks are discussed in detail in section 2and section 3. Furthermore, based on these clustering algorithms,the property of being an outlier is again binary. In this paper, we introduce a new method for finding Outliers in amultidimensional dataset. We introduce a Local outlier (LOF) foreach object in the dataset, indicating its degree of outlier-ness. Thisis, to the best of our knowledge, the first concept of an outlier whichalso quantifies how outlying an object is.

The outlier factor is localin the sense that only a restricted neighborhood of each object istaken into account. Our approach is loosely related to density-basedclustering. However, we do not require any explicit or implicit no-tion of clusters for our method. Specifically, our technical contribu-tions in this paper are as follow: After introducing the concept of LOF, we analyze the formalproperties of LOF. We show that for most objects in a clustertheir LOF are approximately equal to 1. For any other object,we give a lower and upper bound on its LOF.

These boundshighlight the Local nature of LOF. Furthermore, we analyzewhen these bounds are tight. We identify classes of objects forwhich the bounds are tight. Finally, for those objects for whichthe bounds are not tight, we provide sharper bounds. The LOF of an object is based on the single parameter ofMinPts, which is the number of nearest neighbors used in de-Proc. ACM SIGMOD 2000 Int. Conf. On Management of Data, Dalles, TX, 20002fining the Local neighborhood of the object. We study how thisparameter affects the LOF value, and we present practicalguidelines for choosing the MinPts values for finding Local out-liers.

Last but not least, we present experimental results which showboth the capability and the performance of finding Local outli-ers. We conclude that finding Local Outliers using LOF is mean-ingful and paper is organized as follows. In section 2, we discuss relatedwork on outlier detection and their drawbacks. In section 3 we dis-cuss in detail the motivation of our notion of Outliers , especially, theadvantage of a Local instead of a global view on Outliers . In section4 we introduce LOF and define other auxiliary notions. In section 5we analyze thoroughly the formal properties of LOF.

Since LOF re-quires the single parameter MinPts, in section 6 we analyze the im-pact of the parameter, and discuss ways to choose MinPts values forLOF computation. In section 7 we perform an extensive experi-mental evaluation. 2. RELATED WORKMost of the previous studies on outlier detection were conducted inthe field of statistics. These studies can be broadly classified intotwo categories. The first category is distribution-based, where astandard distribution ( Normal, Poisson, etc.) is used to fit thedata best. Outliers are defined based on the probability one hundred tests of this category, called discordancy tests,have been developed for different scenarios (see [5]).

LOF: Identifying Density-Based Local Outliers

Tags:

Information

Transcription of LOF: Identifying Density-Based Local Outliers

Related search queries

LOF: Identifying Density-Based Local Outliers

Tags:

Information

Documents from same domain

Related documents

Related search queries