Image Alignment and Stitching: A Tutorial

Image Alignment and Stitching: A Tutorial1 Richard SzeliskiLast updated, December 10, 2006 Technical ReportMSR-TR-2004-92 This Tutorial reviews Image Alignment and Image stitching algorithms. Image align-ment algorithms can discover the correspondence relationships among images withvarying degrees of overlap. They are ideally suited for applications such as videostabilization, summarization, and the creation of panoramic mosaics. Image stitch-ing algorithms take the Alignment estimates produced by such registration algorithmsand blend the images in a seamless manner, taking care to dealwith potential prob-lems such as blurring or ghosting caused by parallax and scene movement as well asvarying Image exposures. This Tutorial reviews the basic motion models underlyingalignment and stitching algorithms, describes effective direct (pixel-based) and fea-ture-based Alignment algorithms, and describes blending algorithms used to produceseamless mosaics.

It closes with a discussion of open research problems in the ResearchMicrosoft CorporationOne Microsoft WayRedmond, WA 98052 shorter version of this report appeared in Paragios, al., editors,Handbook of MathematicalModels in Computer Vision, pages 273 292, Springer, Introduction12 Motion (planar) motions .. transformations .. and spherical coordinates .. distortions .. 143 Direct (pixel-based) metrics .. motion estimation .. Alignment .. refinement .. motion .. 284 Feature-based detectors .. matching .. registration .. vs. feature-based Alignment .. 465 Global adjustment .. removal .. panoramas .. 536 a compositing surface .. selection and weighting ..647 Extensions and open issues68i1 IntroductionAlgorithms for aligning images and stitching them into seamless photo-mosaics are among theoldest and most widely used in computer vision.

Frame-rate Image Alignment is used in everycamcorder that has an Image stabilization feature. Imagestitching algorithms create the high-resolution photo-mosaics used to produce today s digital maps and satellite photos. They alsocome bundled with most digital cameras currently being sold, and can be used to create beautifulultra wide-angle early example of a widely-used Image registration algorithm is the patch-based translationalalignment (optical flow) technique developed by Lucas and Kanade (1981). Variants of this algo-rithm are used in almost all motion-compensated video compression schemes such as MPEG (Le Gall 1991). Similar parametric motion estimationalgorithms have found a wide varietyof applications, including video summarization (Bergenet , Teodosio and Bender 1993,Kumaret , Irani and Anandan 1998), video stabilization (Hansenet ), and videocompression (Iraniet , Leeet ). More sophisticated Image registration algorithmshave also been developed for medical imaging and remote sensing see (Brown 1992, Zitov aaand Flusser 2003, Goshtasby 2005) for some previous surveysof Image registration the photogrammetry community, more manually intensive methods based on surveyedgroundcontrol pointsor manually registeredtie pointshave long been used to register aerial photos intolarge-scale photo-mosaics (Slama 1980).

One of the key advances in this community was the de-velopment ofbundle adjustmentalgorithms that could simultaneously solve for the locations ofall of the camera positions, thus yielding globally consistent solutions (Triggset ). Oneof the recurring problems in creating photo-mosaics is the elimination of visible seams, for whicha variety of techniques have been developed over the years (Milgram 1975, Milgram 1977, Peleg1981, Davis 1998, Agarwalaet )In film photography, special cameras were developed at the turn of the century to take ultrawide angle panoramas, often by exposing the film through a vertical slit as the camera rotatedon its axis (Meehan 1990). In the mid-1990s, Image alignmenttechniques started being appliedto the construction of wide-angle seamless panoramas from regular hand-held cameras (Mannand Picard 1994, Szeliski 1994, Chen 1995, Szeliski 1996). More recent work in this area hasaddressed the need to compute globally consistent alignments (Szeliski and Shum 1997, Sawhneyand Kumar 1999, Shum and Szeliski 2000), the removal of ghosts due to parallax and objectmovement (Davis 1998, Shum and Szeliski 2000, Uyttendaeleet , Agarwalaet ),and dealing with varying exposures (Mann and Picard 1994, Uyttendaeleet , Levinet , Agarwalaet ).

(A collection of some of these papers can be found in (Benosmanand Kang 2001).) These techniques have spawned a large number of commercial stitching products(Chen 1995, Sawhneyet ), for which reviews and comparison can be found on the most of the above techniques work by directly minimizing pixel-to-pixel dissimilarities,a different class of algorithms works by extracting a sparseset offeaturesand then matching theseto each other (Zoghlamiet , Capel and Zisserman 1998, Cham and Cipolla 1998, , McLauchlan and Jaenicke 2002, Brown and Lowe 2003). Feature-based approaches havethe advantage of being more robust against scene movement and are potentially faster, if imple-mented the right way. Their biggest advantage, however, is the ability to recognize panoramas , , to automatically discover the adjacency (overlap) relationships among an unordered set of im-ages, which makes them ideally suited for fully automated stitching of panoramas taken by casualusers (Brown and Lowe 2003).

What, then, are the essential problems in Image Alignment and stitching ? For Image align-ment, we must first determine the appropriate mathematical model relating pixel coordinates inone Image to pixel coordinates in another. Section 2 reviewsthese basicmotion models. Next, wemust somehow estimate the correct alignments relating various pairs (or collections) of 3 discusses howdirectpixel-to-pixel comparisons combined with gradient descent (andother optimization techniques) can be used to estimate these parameters. Section 4 discusses howdistinctivefeaturescan be found in each Image and then efficiently matched to rapidly establishcorrespondences between pairs of images. When multiple images exist in a panorama, techniquesmust be developed to compute a globally consistent set of alignments and to efficiently discoverwhich images overlap one another. These issues are discussed in Section Image stitching , we must first choose a final compositing surface onto which to warp andplace all of the aligned images (Section 6).

We also need to develop algorithms to seamlessly blendoverlapping images, even in the presence of parallax, lens distortion, scene motion, and exposuredifferences (Section 6). In the last section of this survey,I discuss additional applications of imagestitching and open research Motion modelsBefore we can register and align images, we need to establishthe mathematical relationships thatmap pixel coordinates from one Image to another. A variety ofsuchparametric motion modelsare possible, from simple 2D transforms, to planar perspective models, 3D camera rotations, lensdistortions, and the mapping to non-planar ( , cylindrical) surfaces (Szeliski 1996).To facilitate working with images at different resolutions, we adopt a variant of thenormalizeddevice coordinatesused in computer graphics (Watt 1995, OpenGL-ARB 1997). Fora typical(rectangular) Image or video frame, we let the pixel coordinates range from[ 1,1]along thelonger axis, and[ a,a]along the shorter, whereais the inverse of theaspect ratio, as shown2WH 11 aaxyxyHW aa 11xyxy Figure 1:Mapping from pixel coordinates to normalized device coordinatesin Figure an Image with widthWand heightH, the equations mapping integer pixelcoordinatesx= (x,y)to normalized device coordinatesx= (x,y)arex=2x WSandy=2y HSwhereS= max(W,H).

(1)Note that if we work with images in apyramid, we need to halve theSvalue after each decimationstep rather than recomputing it frommax(W,H), since the(W,H)values may get rounded ortruncated in an unpredictable that for the rest of this paper, we use normalizeddevice coordinates when referring to pixel 2D (planar) motionsHaving defined our coordinate system, we can now describe howcoordinates are transformed. Thesimplest transformations occur in the 2D plane and are illustrated in Figure translations can be written asx =x+torx =hI ti x(2)whereIis the (2 2) identity matrix and x= (x,y,1)is + transformation is also known as2D rigid body motionor the2 DEuclidean transformation(since Euclidean distances are preserved). It can be written asx =Rx+torx =hR ti x(3)1In computer graphics, it is usual to have both axes range from[ 1,1], but this requires the use of two differentfocal lengths for the vertical and horizontal dimensions, and makes it more awkward to handle mixed portrait andlandscape mode 2:Basic set of 2D planar transformationswhereR= cos sin sin cos (4)is an orthonormal rotation matrix withRRT=Iand|R|= known as thesimilarity transform, this transform can be expressed asx =sRx+twheresis an arbitrary scale factor.

It can also be written asx =hsR ti x= a b txb a ty x,(5)where we no longer require thata2+b2= 1. The similarity transform preserves angles affine transform is written asx =A x, whereAis an arbitrary2 3matrix, ,x = a00a01a02a10a11a12 x.(6)Parallel lines remain parallel under affine transform, also known as aperspective transformorhomography, operates onhomogeneous coordinates xand x , x H x,(7)where denotes equality up to scale and His an arbitrary3 3matrix. Note that His itselfhomogeneous, , it is only defined up to a scale. The resulting homogeneous coordinate x mustbe normalized in order to obtain an inhomogeneous resultx , ,x =h00x+h01y+h02h20x+h21y+h22andy =h10x+h11y+h12h20x+h21y+h22.(8)Perspecti ve transformations preserve straight # :IcontranslationhIti2 32orientation+ rigid (Euclidean)hRti2 33lengths+ SSSS similarityhsRti2 34angles+ SSaffinehAi2 36parallelism+ projectiveh Hi3 38straight lines`` Table 1:Hierarchy of 2D coordinate transformations.

The2 3matrices are extended with a third[0T1]row to form a full3 3matrix for homogeneous coordinate of 2D transformationsThe preceding set of transformations are illustrated in Fig-ure 2 and summarized in Table 1. The easiest way to think of these is as a set of (potentiallyrestricted)3 3matrices operating on 2D homogeneous coordinate vectors. Hartley and Zisser-man (2004) contains a more detailed description of the hierarchy of 2D planar above transformations form a nested set ofgroups, , they are closed under compositionand have an inverse that is a member of the same group. Each (simpler) group is a subset of themore complex group below 3D transformationsA similar nested hierarchy exists for 3D coordinate transformations that can be denoted using4 4transformation matrices, with 3D equivalents to translation, rigid body (Euclidean) andaffine transformations, and homographies (sometimes called collineations) (Hartley and Zisserman2004).

Image Alignment and Stitching: A Tutorial

Tags:

Information

Advertisement

Transcription of Image Alignment and Stitching: A Tutorial

Related search queries

Image Alignment and Stitching: A Tutorial

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries