Transcription of “GrabCut” — Interactive Foreground Extraction using ...
1 grabcut Interactive Foreground Extraction using Iterated Graph CutsCarsten Rother Vladimir kolmogorov Microsoft Research Cambridge, UKAndrew Blake Figure 1:Three examples of grabcut .The user drags a rectangle loosely around an object. The object is then extracted problem of efficient, Interactive Foreground /background seg-mentation in still images is of great practical importance in im-age editing. Classical image segmentation tools use either texture(colour) information, Magic Wand, or edge (contrast) infor-mation, Intelligent Scissors. Recently, an approach based onoptimization by graph-cut has been developed which successfullycombines both types of information. In this paper we extend thegraph-cut approach in three respects. First, we have developed amore powerful, iterative version of the optimisation. Secondly, thepower of the iterative algorithm is used to simplify substantially theuser interaction needed for a given quality of result.
2 Thirdly, a ro-bust algorithm for border matting has been developed to estimatesimultaneously the alpha-matte around an object boundary and thecolours of Foreground pixels. We show that for moderately difficultexamples the proposed method outperforms competitive [Computer Graphics]: Picture/ImageGeneration Display algorithms; [Computer Graphics]:Methodology and Techniques Interaction techniques; [Im-age Processing and Computer Vision]: Segmentation Pixel clas-sification; partitioningKeywords: Interactive Image Segmentation, Graph Cuts, ImageEditing, Foreground Extraction , Alpha Matting1 IntroductionThis paper addresses the problem of efficient, Interactive extrac-tion of a Foreground object in a complex environment whose back-ground cannot be trivially subtracted. The resulting Foreground ob-ject is an alpha-matte which reflects the proportion of foregroundand background. The aim is to achieve high performance at thecost of only modest Interactive effort on the part of the user.
3 Highperformance in this task includes: accurate segmentation of objectfrom background; subjectively convincing alpha values, in responseto blur, mixed pixels and transparency; clean Foreground colour, e-mail: e-mail: e-mail: of colour bleeding from the source background. In general,degrees of Interactive effort range from editing individual pixels, atthe labour-intensive extreme, to merely touching Foreground and/orbackground in a few Previous approaches to Interactive mattingIn the following we describe briefly and compare several state ofthe art Interactive tools for segmentation: Magic Wand, IntelligentScissors, Graph Cut and Level Sets and for matting: Bayes Mattingand Knockout. Fig. 2 shows their results on a matting task, togetherwith degree of user interaction required to achieve those Wandstarts with a user-specified point or region to com-pute a region of connected pixels such that all the selected pixelsfall within some adjustable tolerance of the colour statistics of thespecified region.
4 While the user interface is straightforward, findingthe correct tolerance level is often cumbersome and sometimes im-possible. Fig. 2a shows the result using Magic Wand from AdobePhotoshop 7 [Adobe Systems Incorp. 2002]. Because the distri-bution in colour space of Foreground and background pixels have aconsiderable overlap, a satisfactory segmentation is not Scissors( Wire or Magnetic Lasso)[Mortensen and Barrett 1995] allows a user to choose a minimumcost contour by roughly tracing the object s boundary with themouse. As the mouse moves, the minimum cost path from the cur-sor position back to the last seed point is shown. If the computedpath deviates from the desired one, additional user-specified seed points are necessary. In fig. 2b the Magnetic Lasso of Photoshop 7was used. The main limitation of this tool is apparent: for highlytexture (or un-textured) regions many alternative minimal pathsexist. Therefore many user interactions (here 19) were necessary toobtain a satisfactory result.
5 Snakes or Active Contours are a relatedapproach for automatic refinement of a lasso [Kass et al. 1987].Bayes mattingmodels colour distributions probabilistically toachieve full alpha mattes [Chuang et al. 2001] which is based on[Ruzon and Tomasi 2000]. The user specifies a trimap T={TB,TU,TF}in which background and Foreground regionsTBandTFare marked, and alpha values are computed over the remain-ing regionTU. High quality mattes can often be obtained ( ), but only when theTUregion is not too large and the back-ground/ Foreground colour distributions are sufficiently well sepa-rated. A considerable degree of user interaction is required to con-struct an internal and an external 2[Corel Corporation 2002] is a proprietary plug-in forPhotoshop which is driven from a user-defined trimap, like Bayesmatting, and its results are sometimes similar (fig. 2d), sometimesof less quality according to [Chuang et al. 2001].Graph Cut[Boykov and Jolly 2001; Greig et al.]
6 1989] is a pow-erful optimisation technique that can be used in a setting similarto Bayes Matting, including trimaps and probabilistic colour mod-els, to achieve robust segmentation even in camouflage, when fore-ground and background colour distributions are not well system is explained in detail in section 2. Graph Cut techniquescan also be used for image synthesis, like in [Kwatra et al. 2003]where a cut corresponds to the optimal smooth seam between twoimages, source and target sets[Caselles et al. 1995] is a standard approach to imageand texture segmentation. It is a method for front propagation bysolving a corresponding partial differential equation, and is oftenused as an energy minimization tool. Its advantage is that almostany energy can be used. However, it computes only a local mini-mum which may depend on initialization. Therefore, in cases wherethe energy function can be minimized exactly via graph cuts, thelatter method should be preferable. One such case was identifiedby [Boykov and kolmogorov 2003] for computing geodesics andminimal surfaces in Riemannian Proposed system: GrabCutIdeally, a matting tool should be able to produce continuous alphavalues over the entire inference regionTUof the trimap, withoutany hard constraint that alpha values may only be 0 or 1.
7 In thatway, problems involving smoke, hair, trees etc., could be dealt withappropriately in an automatic way. However, in our experience,techniques designed for solving that general matting problem [Ru-zon and Tomasi 2000; Chuang et al. 2001] are effective when thereis sufficient separation of Foreground and background color distri-butions but tend to fail in camouflage. Indeed it may even be that thegeneral matting problem is not solvable in camouflage, in the sensethat humans would find it hard to perceive the full matte. This mo-tivates our study of a somewhat less ambitious but more achievableform of the we obtain a hard segmentation (sections 2 and 3) usingiterative graph cut . This is followed by border matting (section 4)in which alpha values are computed in a narrow strip around thehard segmentation boundary. Finally, full transparency, other thanat the border, is not dealt with by grabcut . It could be achievedhowever using the matting brush of [Chuang et al.]
8 2001] and, inour experience, this works well in areas that are sufficiently free novelty of our approach lies first in the handling of segmen-tation. We have made two enhancements to the graph cuts mech-anism: iterative estimation and incomplete labelling which to-gether allow a considerably reduced degree of user interaction for agiven quality of result (fig. 2f). This allows grabcut to put a lightload on the user, whose interaction consists simply of dragging arectangle around the desired object. In doing so, the user is indi-cating a region of background, and is free of any need to mark aforeground region. Secondly we have developed a new mechanismfor alpha computation, used for border matting, whereby alpha val-ues are regularised to reduce visible Image segmentation by graph cutFirst, the segmentation approach of Boykov and Jolly , the founda-tion on which grabcut is built, is described in some segmentationTheir paper [Boykov and Jolly 2001] addresses the segmentationof a monochrome image, given an initial trimapT.
9 The image isan arrayz= (z1, .. ,zn, .. ,zN)of grey values, indexed by the (sin-gle) indexn. The segmentation of the image is expressed as anarray of opacity values = ( 1, .. , N)at each pixel. Gener-ally 0 n 1, but for hard segmentation n {0,1}, with 0 forbackground and 1 for Foreground . The parameters describe imageforeground and background grey-level distributions, and consist ofhistograms of grey values: ={h(z; ), =0,1},(1)one for background and one for Foreground . The histograms areassembled directly from labelled pixels from the respective trimapregionsTB,TF. (Histograms are normalised to sum to 1 over thegrey-level range: zh(z; ) =1.)The segmentation task is to infer the unknown opacity variables from the given image datazand the model . by energy minimisationAn energy functionEis defined so that its minimum should cor-respond to a good segmentation, in the sense that it is guided bothby the observed Foreground and background grey-level histogramsand that the opacity is coherent , reflecting a tendency to solidityof objects.
10 This is captured by a Gibbs energy of the form:E( , ,z) =U( , ,z) +V( ,z).(2)The data termUevaluates the fit of the opacity distribution to thedataz, given the histogram model , and is defined to be:U( , ,z) = n logh(zn; n).(3)The smoothness term can be written asV( ,z) = (m,n) Cdis(m,n) 1[ n6= m]exp (zm zn)2,(4)where[ ]denotes the indicator function taking values 0,1 for apredicate ,Cis the set of pairs of neighboring pixels, and wheredis( )is the Euclidean distance of neighbouring pixels. This energyencourages coherence in regions of similar grey-level. In practice,good results are obtained by defining pixels to be neighbours if theyare adjacent either horizontally/vertically or diagonally (8-way con-nectivity). When the constant =0, the smoothness term is simplythe well-known Ising prior, encouraging smoothness everywhere, toa degree determined by the constant . It has been shown however[Boykov and Jolly 2001] that it is far more effective to set >0 asthis relaxes the tendency to smoothness in regions of high constant is chosen [Boykov and Jolly 2001] to be: =(2 (zm zn)2 ) 1,(5)where denotes expectation over an image sample.