Dense 3D Face Alignment from 2D Videos in Real-Time

Dense 3D Face Alignment from 2D Videos in Real-Time L aszl o A. Jeni1 , Jeffrey F. Cohn1,2 , and Takeo Kanade1. 1 Robotics Institute, Carnegie Mellon university , pittsburgh , PA, USA. 2 Department of Psychology, university of pittsburgh , pittsburgh , PA, USA. Fig. 1: From a 2D image of a person's face (a) a Dense set of facial landmarks is estimated using a fast, consistent cascade regression framework (b), then a part-based 3D deformable model is applied (c) to reconstruct a Dense 3D mesh of the face (d). Abstract To enable Real-Time , person-independent 3D reg- to representational power and robustness to illumination and istration from 2D video , we developed a 3D cascade regression pose but are not feasible for generic fitting and Real-Time use. approach in which facial landmarks remain invariant across pose over a range of approximately 60 degrees.

From a single Seminal work by Blanz et al. [5] on 3D morphable 2D image of a person's face, a Dense 3D shape is registered in models minimized intensity difference between synthesized real time for each frame. The algorithm utilizes a fast cascade and source- video images. Dimitrijevic et al. [15] proposed a regression framework trained on high-resolution 3D face-scans 3D morphable model similar to that of Blanz that discarded of posed and spontaneous emotion expression. The algorithm first estimates the location of a Dense set of markers and the texture component in order to reduce sensitivity to illumi- their visibility, then reconstructs face shapes by fitting a part- nation. Zhang et al. [41] proposed an approach that deforms based 3D model. Because no assumptions are required about a 3D mesh model so that the 3D corner points reconstructed illumination or surface properties, the method can be applied from a stereo pair lie on the surface of the model.

Both to a wide range of imaging conditions that include 2D video and [41] and [15] minimize shape differences instead of intensity uncalibrated multi-view video . The method has been validated in a battery of experiments that evaluate its precision of differences, but rely on stereo correspondence. Single view 3D reconstruction and extension to multi-view reconstruction. face reconstruction methods [23], [20] produce a detailed Experimental findings strongly support the validity of Real-Time , 3D representation, but do not estimate the deformations over 3D registration and reconstruction from 2D video . The software time. Recently, Suwajanakorn et al. [33] proposed a 3D flow is available online at based approach coupled with shape from shading to reconstruct a time-varying detailed 3D shape of a person's face I.

INTRODUCTION from a video . Gu and Kanade [18] developed an approach for aligning a 3D deformable model to a single face image. Face Alignment is the problem of automatically locating The model consists of a set of sparse 3D points and the detailed facial landmarks across different subjects, illumina- view-based patches associated with every point. These and tions, and viewpoints. Previous methods can be divided into other 3D-based methods require precise initialization, which two broad categories. 2D-based methods locate a relatively typically involves manual labeling of the fiduciary landmark small number of 2D fiducial points in real time while 3D- points. The gain with 3D-based approaches is their far greater based methods fit a high-resolution 3D model offline at a representational power that is robust to illumination and much higher computational cost and usually require manual viewpoint variation that would scuttle 2D-based approaches.

Initialization. 2D-based approaches include Active Appear- ance Models [11], [28], Constrained Local Models [12], [31] A key advantage of 2D-based approaches is their much and shape- regression -based methods [16], [10], [37], [6], lower computational cost and more recently the ability to [30]). These approaches train a set of 2D models, each of forgo manual initialization. In the last few years in particular, which is intended to cope with shape or appearance variation 2D face Alignment has reached a mature state with the within a small range of viewpoints. In contrast, 3D-based emergence of discriminative shape regression methods [10], methods [5], [15], [41], [18] accommodate wide range of [6], [13], [16], [32], [36], [27], [37], [30], [39], [22], [2], views using a single 3D model.

Recent 2D approaches enable [9]. These techniques predict a face shape in a cascade person-independent initialization, which is not possible with manner: They begin with an initial guess about shape and 3D approaches. 3D approaches have advantage with respect then progressively refine that guess by regressing a shape increment step-by-step from a feature space. The feature 2015. c IEEE. 11th IEEE International Conference on Automatic Face space can be either hand designed, such as SIFT features and Gesture Recognition, Ljubljana, Slovenia, May 2015 (accepted). [37], or learned from data [10], [6], [30]. q Most previous work has emphasized 2D face tracking and di=1 u2i . B = [A1 ; .. ; AK ] R(d1 +..+dK ) N denotes the registration. Relatively neglected is the application of cascade concatenation of matrices Ak Rdk N.

regression in Dense 3D face Alignment . Only recently did Cao et al. [9] propose a method for regressing facial landmarks II. Dense FACE MODEL BUILDING. from 2D video . Pose and facial expression are recovered by In this section we detail the components of the Dense 3D. fitting a user-specific blendshape model to them. This method face model building process. then was extended to a person-independent case [8], where A. Linear Face Models the estimated 2D markers were used to adapt the camera matrix and user identity to better match facial expression. We are interested in building a Dense linear shape model. Because this approach uses both 2D and 3D annotations, a A shape model is defined by a 3D mesh and, in particular, by correction step is needed to resolve inconsistency in the land- the 3D vertex locations of the mesh, called landmark points.

Mark positions across different poses and self-occlusions. Consider the 3D shape as the coordinates of 3D vertices that Our approach exploits 3D cascade regression , where the make up the mesh: facial landmarks are consistent across all poses. To avoid inconsistency in landmark positions encountered by Cao et x = [x1 ; y1 ; z1 ; .. ; xM ; yM ; zM ], (1). al., the face is annotated completely in 3D by selecting a or, x = [x1 ; .. ; xM ], where xi = [xi ; yi ; zi ]. We have T samples: Dense set of 3D points (shape). Binary feature descriptors T . {x(t)}t=1. (appearance) associated with a sparse subset of the landmarks We assume that apart from scale, rotation, and translation are used to regress projections of 3D points. The method all samples {x(t)}t=1 T can be approximated by means of a first estimates the location of a Dense set of markers and linear subspace.

Their visibility, then reconstructs face shapes by fitting a part- The 3D point distribution model (PDM) describes non- based 3D model. The method was made possible in part by rigid shape variations linearly and composes it with a global training on the BU-4 DFE [38] and BP-4D-Spontaneous [40] rigid transformation, placing the shape in the image frame: datasets that contain over 300,000 high-resolution 3D face scans. Because the algorithm makes no assumptions about illumination or surface properties, it can be applied to a wide xi = xi (p) = sR( xi + i q) + t (i = 1, .. , M), (2). range of imaging conditions. The method was validated in a where xi (p) denotes the 3D location of the ith landmark and series of tests. We found that 3D registration from 2D video p = {s, , , , q, t} denotes the parameters of the model, effectively handles previously unseen faces with a variety of which consist of a global scaling s, angles of rotation in poses and illuminations.

Three dimensions (R = R1 ( )R2 ( )R3 ( )), a translation t This paper advances two main novelties: and non-rigid transformation q. Here x i denotes the mean Dense cascade- regression -based face Alignment location of the ith landmark ( x i = [x i ; y i ; z i ] and x =. Previous work on cascade- regression -based face [ x1 ; .. ; x M ]). The d pieces of 3M dimensional basis vectors Alignment was limited to a small number of fiducial are denoted with = [ 1 ; .. ; M ] R3M d . landmarks. We achieve a Dense Alignment with Vector q represents the 3D distortion of the face in the a manageable model size. We show that this is 3M d dimensional linear subspace. To build this model we achievable by using a relatively small number of used high-resolution 3D face scans. We describe this in the sparse measurements and a compressed represen- next subsection.

Dense 3D Face Alignment from 2D Videos in Real-Time

Tags:

Information

Transcription of Dense 3D Face Alignment from 2D Videos in Real-Time

Related search queries

Dense 3D Face Alignment from 2D Videos in Real-Time

Tags:

Information

Documents from same domain

Related documents

Related search queries