Example: marketing

Person-Independent 3D Gaze Estimation Using Face ...

Person-Independent 3D gaze Estimation Using Face FrontalizationL aszl o A. JeniCarnegie Mellon UniversityPittsburgh, PA, F. CohnUniversity of PittsburghPittsburgh, PA, 1:From a 2D image of a person s face (a) a dense, part-based 3D deformable model is aligned (b) to reconstruct a partial frontalview of the face (c). Binary features are extracted around eye and pupil markers (d) for the 3D gaze calculation (e).AbstractPerson- independent and pose-invariant Estimation ofeye- gaze is important for situation analysis and for auto-mated video annotation. We propose a fast cascade re-gression based method that first estimates the location ofa dense set of markers and their visibility, then reconstructsface shape by fitting a part-based 3D model.

Person-independent 3D Gaze Estimation using Face Frontalization L´aszl o A. Jeni´ Carnegie Mellon University Pittsburgh, PA, USA laszlojeni@cmu.edu

Tags:

  Using, Independent, Persons, Faces, Estimation, Gaze, Person independent 3d gaze estimation, Person independent 3d gaze estimation using face frontalization, Frontalization

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Person-Independent 3D Gaze Estimation Using Face ...

1 Person-Independent 3D gaze Estimation Using Face FrontalizationL aszl o A. JeniCarnegie Mellon UniversityPittsburgh, PA, F. CohnUniversity of PittsburghPittsburgh, PA, 1:From a 2D image of a person s face (a) a dense, part-based 3D deformable model is aligned (b) to reconstruct a partial frontalview of the face (c). Binary features are extracted around eye and pupil markers (d) for the 3D gaze calculation (e).AbstractPerson- independent and pose-invariant Estimation ofeye- gaze is important for situation analysis and for auto-mated video annotation. We propose a fast cascade re-gression based method that first estimates the location ofa dense set of markers and their visibility, then reconstructsface shape by fitting a part-based 3D model.

2 Next, the re-constructed 3D shape is used to estimate a canonical viewof the eyes for 3D gaze Estimation . The model operates ina feature space that naturally encodes local ordinal prop-erties of pixel intensities leading to photometric invariantestimation of gaze . To evaluate the algorithm in compar-ison with alternative approaches, three publicly-availabledatabases were used, Boston University Head Tracking,Multi-View gaze and CAVE gaze datasets. Precision forhead pose and gaze averaged 4 degrees or less for pitch,yaw, and roll. The algorithm outperformed alternativemethods in both IntroductionGaze and eye contact communicate interpersonal en-gagement and emotion, express intimacy, reveal attentionand cognitive processes, signal objects or events of interest,and regulate social interaction [14, 22, 30, 31].

3 Automatedtracking of gaze in highly constrainted contexts, such aswhile seated in front of a computer monitor or when wear-ing specialized eyewear, is well developed Using commer-cial software [43]. In less constrained contexts, such as dur-ing social interaction or in a car while driving, automatedgaze tracking presents a challenging and vital problem. Ef-forts to detect gaze in social and automotive contexts arejust beginning [25]. gaze Estimation methods can be categorized into model-based and appearance-based approaches [18]. Model-based3D gaze Estimation methods use 3D eyeball models and es-timate gaze direction Using geometric eye features [13, 17].

4 They typically use infrared light sources together with ahigh-resolution camera to locate the 3D eyeball positionand its line of sight via personal calibration. Althoughthis approach can accurately estimate gaze direction, it ne-cessitates specialized hardware that limits its range of ap-plication. If the iris contour alone is used to detect lineof sight, the need for specialized eye wear can be relaxed[12, 49, 20, 45]. The latter is effective for short distance sce-narios in which high-resolution observations are effectiveness in mid-distance scenarios is methods compute non-geometric im-age features from input eye images to estimate gaze direc-tion.

5 This approach frames the gaze Estimation problem intoone of learning a mapping function from the input eye im-ages to the target gaze directions. The corresponding map-ping can be learned Using different regression techniques,including artificial neural networks [4], adaptive linear re-gression [26], interpolation [40], and Gaussian process re-gression [36, 46].For appearance-based 3D gaze Estimation , the 3D po-sition of the eye must be found in order to estimate thegaze target in the camera-coordinate system. With the re-cent advancement of monocular 3D head pose tracking [29]and the increasing availability of depth cameras with headpose tracking capabilities [3], the means of capturing 3D187head poses are becoming readily available.

6 Indeed, recentappearance-based 3D gaze Estimation methods use 3D headposes obtained as an additional input for gaze Estimation [27, 16].Appearance-based methods have the advantage of re-quiring only a single camera and natural illumination. Theycan use images with common or even low resolution. Theytypically regard a whole eye image as a high-dimensionalinput vector and learn the mapping between these vectorsand the gaze methods, however, often lack robust-ness to head motion and illumination variation. Becauseappearance-based methods require precise alignment of theeye region, head pose variation is particularly precise alighnment, they are prone to large illumination conditions, such as in a driving sce-nario [20], also affect their performance.

7 Using active near-infrared imaging can alleviate this issue [45].3D Estimation from 2D video is a promising is made possible in part by recent advances in 2D shapealignment that use discriminative shape regression methods[10, 38, 48, 32, 9]. These techniques predict a face shapein a cascade manner: They begin with an initial guess aboutshape and then progressively refine that guess by regressinga shape increment step-by-step from a feature space. Thefeature space can be hand designed, and may utilize SIFT features [48] or learned from the data [10, 6, 32].Our approach exploits cascade shape regression for 3 Dgaze and head pose Estimation .

8 The method was made pos-sible, in part, by training on the Multi-view gaze (MVG)Dataset [37] that contains 8,000 3D face meshes. Themethod was validated in a series of tests. We found that eyealignment and 3D absolute gaze Estimation from 2D imageseffectively handles previously unseen faces that may span avariety of poses and paper advances two main novelties. First, themethod estimates self-occluded markers and computes acanonical view Using the estimated 3D head pose. Thiseliminates the need of learning a manifold of appearancefeatures for gaze Estimation . Second, the proposed methodoperates in a binary feature space that is robust to photo-metric variations and different scales.

9 We demonstrate thatprecise 3D gaze Estimation is possible from this paper is organized as follows: Section 2 detailsthe cascade framework and different regression 3 describes the 3D gaze dataset and manual anno-tations we used for training our system. The efficiency ofour novel solution method is illustrated by numerical exper-iments in Section 4. Conclusions are drawn in Section (a) and matrices (A) are denoted bybold letters. Anu Rdvector s Euclidean norm iskuk2= di= [A1;..;AK] R(d1+..+dK) Ndenotes theconcatenation of matricesAk Rdk Part-based Linear Face ModelsWe are interested in building a dense linear shape shape model is defined by a 3D mesh and, in particular, bythe 3D vertex locations of the mesh, called landmark the 3D shape as the coordinates of 3D vertices thatmake up the mesh:x= [x1;y1;z1.]

10 ;xM;yM;zM],(1)or,x= [x1;..;xM], wherexi= [xi;yi;zi]. We haveTsam-ples:{x(t)}Tt= assume that apart from scale, rotation, and transla-tion all samples{x(t)}Tt=1can be approximated by meansof a linear 3D point distribution model (PDM) describes non-rigid shape variations linearly and composes it with a globalrigid transformation, placing the shape in the image frame:xi=xi(p,q) =sR( xi+ iq)+t(i=1,..,M),(2)wherexi(p,q)denotes the 3D location of theithland-mark andp={s, , , ,t}denotes the rigid parametersof the model, which consist of a global scalings, anglesof rotation in three dimensions (R=R1( )R2( )R3( )),a translationt.


Related search queries