A Comparison and Evaluation of Multi-View Stereo ...

A Comparisonand Evaluationof Multi-View Stereo ReconstructionAlgorithmsSteven M. SeitzBrianCurlessUniversityof WashingtonJamesDiebelStanfordUniversityD anielScharsteinMiddlebury CollegeRichardSzeliskiMicrosoftResearchA bstractThispaperpresentsa quantitativecomparisonof severalmulti- view , thelack ofsuitablecalibratedmulti- view image datasetswithknowngroundtruth(3 Dshapemodels) thispaper, we rst survey multi -viewstereoalgorithmsandcompare themqualitativelyusingataxonomythatdiffe rentiatestheirkey thendescribeourprocessforacquiringandcal ibratingmulti- view image datasetswithhigh-accuracygroundtruthandi ntroduceourevaluationmethodology. Finally, wepresenttheresultsof ourquantitativecomparisonof state-of-the-artmulti- view ,evaluationdetails,andin-structionsforsu bmittingnew modelsare availableonlineat IntroductionThe goalof Multi-View stereois to reconstructa com-plete3D object modelfroma collectionof imagestakenfromknown the last few years,a numberof high-qualityalgorithmshave beendeveloped,and the stateof the art is improvingrapidly.

Unfortunately,the lackof benchmarkdatasetsmakes it dif cultto quan-titatively comparethe performanceof thesealgorithmsandto thereforefocusresearchon the mostneededareasof situationin binocularstereo,wherethe goalis toproducea densedepthmap froma pair of images,was untilrecentlysimilar. Here,however, a databaseof imageswithground-truthresultshas madethe comparisonof algorithmspossibleand hencestimulatedan even fasterincreasein al-gorithmperformance[1].In this paper, we aim to rectifythis imbalanceby pro-viding,for the rst time,a collectionof high-qualitycal-ibratedmulti- view stereoimagesregisteredwithground-truth3D modelsand an evaluationmethodologyfor com-paringmulti- view paper's contributionsincludea taxonomyof Multi-View stereoreconstructionalgorithmsinspiredby [1] (Sec-tion2), the acquisitionanddisseminationof a set ofcalibratedmulti- view imagedatasetswithhigh-accuracyground-tru th3D surface models(Section3), an evalua-tionmethodologythat measuresreconstructionaccuracyand completeness(Section4), and a quantitative evaluationof someof the currentlybest-performingalgorithms(Sec-t ion 5).

Whilethe currentevaluationonlyincludesmeth-ods whoseauthorswereable to provideus theirresultsbyCVPR nalsubmissiontime,our datasetsand evaluationresultsare publiclyavailable[2] and opento the generalcommunity. We planto regularlyupdatethe results,andpublisha morecomprehensive comparative evaluationas limitthe scopeof this paperto algorithmsthat re-constructdenseobjectmodelsfromcalibra tedviews. Ourevaluationthereforedoesnot includetraditionalbinocular,trinocular, and multi -baselinestereomethods,whichseekto reconstructa singledepthmap,or structure-from-motionand sparsestereomethodsthat computea sparseset of fea-ture ,we restrictthe currentevaluationto objectsthat are nearlyLambertian,whichis , we also captured and plantoprovidedatasetsof specularscenes and planto extendourstudyto includesuchscenesin the not the rst to survey Multi-View stereoalgorithms;we referreadersto nicesurveys by Dyer[3]and Slabaughet al. [4] of algorithmsup to , the stateof the art has changeddramaticallyin the last ve years,warrantinga new overview of the addi-tion,this paperprovidesthe rstquantitativeevaluationofa broadrangeof Multi-View A multi -viewstereo taxonomyOneof the challengesin comparingandevaluatingmulti- view stereoalgorithmsis thatexistingtechniquesvary signi cantlyin theirunderlyingassumptions,operat-ing ranges,and behavior.

Similarin spiritto the binoc-ularstereotaxonomy[1], we categorizeexistingmeth-ods accordingto six fundamentalpropertiesthat differen-tiate the majoralgorithms:thescenerepresentation,p hoto-consistencymeasure,visibilitymodel, shapeprior,recon-structionalgorithm, ScenerepresentationThe geometryof an objector scenecan be representedin numerous ways;the vast majorityof Multi-View algo-rithmsuse voxels, level-sets,polygonmeshes,or singlerepresentation,othersemploy differentrepresentationsfor variousstepsinthe this sectionwe give a verybriefoverview of theserepresentationsand in discusshow they are usedin the techniquesrepresentgeometryon a regularlysam-pled 3D grid (volume),eitheras a discreteoccupancy func-tion ( ,voxels [5 19]),or as a functionencodingdistanceto the closestsurface ( ,level-sets[20 26]).3D gridsarepopularfor theirsimplicity, uniformity, and abilityto ap-proximateany surface as a set of connectedplanarfacets. They are ef cientto storeand renderandare thereforea popularoutputformatfor Multi-View also particularly well-suitedfor visibil-ity computationsand are also usedas the centralrepresen-tationin somealgorithms[27 32].

Somemethodsrepresentthe sceneas a set of depthmaps,one for eachinputview [33 38].Thismulti-depth-map representationavoids resamplingthe geometryon a 3 Ddomain,and the 2D representationis convenientparticu-larlyfor alternative is to de nethedepthmapsrelative to scenesurfacesto formareliefsur-face[39,40]. Photo consistencymeasureNumerousmeasureshave beenproposedfor evaluatingthe visualcompatibilityof a reconstructionwitha set of in-put vast majorityof thesemeasuresoperatebycomparingpixels in one imageto pixels in otherimagestosee how well they this reason,they are oftencalledphoto-consistencymeasures[11] .The choiceof mea-sure is not necessarilyintrinsicto a particularalgorithm itis oftenpossibleto take a measurefromone methodandsubstituteit in another. We categorizephoto-consistencymeasuresbased on whetherthey are de nedinscenespaceorimage space[22].Scenespacemeasureswork by takinga point,patch,orvolumeof geometry, projectingit into the inputimages,andevaluatingthe amountof mutualagreement simplemeasureof agreementis the varianceof the projectedpixels in the inputimages[8, 11].

Othermethodscompareimagestwo at a time,and use window-matchingmetricssuchas sum of squareddifferencesor nor-malizedcrosscorrelation[20, 23, 31].An interestingfea-ture of scene-spacewindow-basedmethodsis that the cur-rent estimateof the geometrycan informthe size and shapeof the window [20].A numberof otherphoto-consistencymeasureshave beenproposedto provide robustnessto smallshiftsand othereffects[12,18].Image spacemethodsuse an estimateof scenegeometryto warp an imagefrom one viewpointto predicta differentview. Comparingthe predictedand measuredimagesyieldsa photo-consistency measureknown aspredictionerror[26,41].Whileprediction erroris conceptuallyvery similartoscenespacemeasures,an importantdifferenceis the domainof integratedover a surface and thus oftentend to prefersmallersurfaces,whereaspredictione rroris integratedover the set of imagesof a sceneand thus ascribemoreweightto partsof the scenethat appearfrequentlyor occupy a large traditionallyassumedapproximatelyview-in dependentintensities, , Lamber-tian scenes,a numberof new photo-consistency metricshave beendevisedthat seekto modelmoregeneralre ec-tion functions(BRDFs)[15 17,22,23,32].

Somemethodsalso utilizesilhouettes[27,30,31] or shadows [17,42]. VisibilitymodelVisibilitymodelsspecifywh ichviews to considerwhenevaluatingphoto-consistency changedramaticallywithviewpoint,almostal lmodernmulti- view stereoalgorithmsaccountfor occlu-sionsin someway or another. Earlyalgorithmsthat did notmodelvisibility[6,27,43] have troublescalingto large dis-tributionsof handlingvisibilityincludegeometric,quasi -geometric, to explicitlymodelthe imageformationprocessand the shapeof the sceneto determinewhichscenestructuresare visiblein com-monapproachin surface evolutionapproachesis to use thecurrentestimateof the geometryto predictvisibilityfor ev-ery pointon that surface [5, 11, 12, 19, 20, 29, 30, 40].Fur-thermore,if the surface evolutionbegins witha surface thatenclosesthe scenevolumeand evolves by carvingaway thatvolume,this visibilityapproachcan be shown to beconser-vative[11, 18]; , the set of camerasfor whicha scenepointis predictedto be visibleis a subsetof the set of cam-eras in whichthat pointis be simpli edby constrain-ing the allowable distributionof thescenelies outsidethe convex hull of the cameracenters,the occlusionorderingof pointsin the sceneis sameforall cameras[8], enablinga numberof moreef cientalgo-rithms[8,10,13,35,44].

Quasi-geometrictechniquesuse approximategeometricreasoningto example,apopularheuristicfor minimizingthe effectsof occlusionsisto limitthe photo-consistency analysisto clusters of nearbycameras[31, 45].Thisapproachis often usedin combi-nationwithotherformsof geometricreasoningto avoidobliqueviews and to minimizecomputations[5,11,26]. An-othercommonquasi-geometrictechniqueis to use a roughestimateof the surface suchas the visualhull [46] to guessvisibilityfor neighboringpoints[19,47,48].The thirdtypeof methodis to avoid explicitgeometricreasoningand insteadtreatocclusionsasoutliers[31, 34,37, 38].Especiallyin caseswherescenepointsare visiblemoreoftenthanthey are occluded,simpleoutlierrejectiontechnique s[49]can be usedto selectthe goodviews. Aheuristicoftenusedin tandemwithoutlierrejectionis toavoid comparingviews that are far apart,thereby increasingthe likely percentageof inliers[31,34,37,38]. ShapepriorPhoto-consistency measuresaloneare not always suf- cientto recover precisegeometry, particularlyin low-texturedsceneregions[11, 50].

It can thereforebe helpfulto imposeshapepriorsthat bias the reconstructionto essentialfor binoc-ular Stereo ,they playa less importantrole in multi -viewstereowherethe constraintsfrommany views are measuresnaturallyseekminimalsurfaceswith smalloverallsurface is whatenablesmany level-setalgorithmsto converge froma grossinitialshape[20].The preferencefor minimalsurfacescan alsoresultin a tendency to smoothover pointsof highcurvature(see[51, 52] for ways to addressthis problem).Recentapproachesbasedon volumetricmin-cut[19, 47] alsohave a biasfor numberof mesh-basedalgorithmsincorporatetermsthat causetrianglestoshrink[29, 31] or preferreferenceshapessuchas a sphereor a plane[27].Many methodsbasedon voxel coloringand spacecarv-ing [5, 8, 9, 11, 12, 16, 18, 53] insteadprefermaximalsur-faces. Sincethesemethods operateby removingvoxelsonlywhenthey are not photo-consistent,they producethelargest photo-consistentscenereconstruction,know n as the photohull. Becausethey do not assumethat the surfaceis smooth,thesetechniquesare goodat reconstructinghighcurvatureor thin , the surface tendstobulge out in regionsof low surface texture[8,11].

Ratherthanimposeglobalpriorson the overallsize ofthe surface,othermethodsemploy shapepriorsthat representthescenewithdepthmapstypicallyo ptimizeanimage-basedsmoothnessterm[33 37,45] that seeksto give neighboringpixels the prior ts nicelyinto a 2D Markov RandomField(MRF)framework, andcan thereforetake advantageof ef cientMRFsolvers [35].A disadvantage is that thereis a bias toward can be avoidedby enforcingsurface-basedpriors,as in [27,29 32,40,47,48]. ReconstructionalgorithmMulti- view stereoalgorithmscan be roughlycategorizedinto four rst classoperatesby rst computinga cost functionon a 3D volume,and thenextractinga surface simpleexampleof this approachis the voxelcoloringalgorithmand its variants[8, 17], whichmake asinglesweepthroughthe volume,computingcostsand re-constructingvoxels with costsbelow a thresholdin the samepass (notethat [13] avoids the needfor a threshold).Otheralgorithmsdiffer in the de nitionof the cost function andthe surface numberof methodsde- ne a volumetricMRFand use max- ow [6, 19, 47, 48] ormulti-way graphcut [35] to extractan techniquesworksby iterativelyevolvinga surface to decreaseor minimizea voxels, levelsets,and surface [5, 11] and itsvariants[9, 11, 12, 14, 18, 40, 53] progressively remove in-consistentvoxels froman approachenableaddingas wellas deletingvoxels tominimizean energy function[15,54].

A Comparison and Evaluation of Multi-View Stereo ...

Tags:

Information

Transcription of A Comparison and Evaluation of Multi-View Stereo ...

Related search queries

A Comparison and Evaluation of Multi-View Stereo ...

Tags:

Information

Documents from same domain

Related documents

Related search queries