Computer Vision

ComputerVisionComputerScienceTripos:16 Lecturesby J G Daugman1. computervision;why theyareso di Imagesensing,pixelarrays, Biologicalvisualmechanisms,fromretinato Edgedetectionoperators; Higherbrainvisualmechanisms;streaming; Texture,colour,stereo, ; ers; a setof :facedetectionandrecognition; thiscoursearetointroducetheprinciples,mo delsandapplicationsof com-putervision,as wellas somemechanismsusedin biologicalvisualsystemsthatmay inspiredesignof arti :imageformation,structure,andcoding;edge andfeaturedetection;neuraloperatorsforim ageanalysis;texture,colour,stereo,motion ;waveletmethods forvisualcodingandanalysis;interpretatio nof surfaces,solids,andshapes;datafusion;pro babilisticclassi ers;visualinferenceandlearning; Goalsof computervision.

Why theyareso di imagesareformed,andtheill-posedproblemof making3 Dinferencesfromthemaboutobjectsandtheirp roperties. Imagesensing,pixelarrays, Biologicalvisualmechanismsfromretinato ;receptive eldpro les;spike trains; ;convolution; Edgedetectionoperators;theinformationrev ealedby 'sTheorem. visualprimitives. Higherlevel visualoperationsin ;stream-inganddivisionsof labour;reciprocalfeedback throughthevisualsystem. Texture,colour,stereo, ofinvariances. Dis-countingtheilluminant wheninfering3 Dstructureandsurfaceproperties. Shape fromshading;surfacegeometry. Boundarydescriptors;codons;superquadrics andthe\ "sketch. see. Lessonsfromneurologicaltraumaandvisualde implyabouthow visionworks. Bayesianinferencein Vision ;knowledge-driven in Vision .

Visionasa setof inverseproblems;mathematicalmethodsforso lvingthem:energyminimisation,relaxation, regularisation. Approachesto facedetection,facerecognition, theendof thecoursestudents should: understandvisualprocessingfromboth\botto m-up"(dataoriented)and\top-down"(goalsor iented)perspectives be ableto decomposevisualtasksinto sequencesof imageanalysisoperations,represen-tations ,speci calgorithms,andinferenceprinciples understandtherolesof imagetransformationsandtheirinvariancesi n patternrecogni-tionandclassi cation be ableto analysetherobustness,brittleness,general isability, andperformanceof dif-ferent approachesin computervision be ableto describe keyaspectsof how biologicalvisualsystemsencode,analyse,an drepresent visualinformation be ableto thinkof ways in which biologicalvisualstrategiesmight be implementedinmachinevision,despitetheeno rmousdi erencesin hardware understandin depthat leastoneimportant applicationdomain,such as facerecognition,detection,or interpretationRecommendedbookShapiro,L.

& Stockman,G.(2001).ComputerVision. : \TheComputerVisionHomepage"(CarnegieMell onUniversity): : computervision;why theyareso di generateintelligent andusefuldescriptionsof visualscenesandsequences,andof theobjectsthatpopulatethem,by performingoperationsonthesignalsreceived computervisionapplicationsandgoals: automaticfacerecognition,andinterpretati onof expression visualguidanceof autonomousvehicles automatedmedicalimageanalysis,interpreta tion,anddiagnosis roboticmanufacturing:manipulation,gradin g,andassemblyof parts OCR:recognitionof printedor handwrittencharactersandwords agriculturalrobots:visualgradingandharve stingof produce smarto ces:trackingof personsandobjects;understandinggestures biometric-basedvisualidenti cationof persons visuallyendowedrobotichelpers security monitoringandalerting;detectionof anomaly intelligent interpretive prosthesesfortheblind trackingof movingobjects;collisionavoidance.

Stereoscopicdepth object-based(model-based)compressionof videostreams generalsceneunderstandingIn many respects,computervisionis an\AI-complete"problem:buildinggeneral-p urposevisionmachineswouldentail,orrequir e,solutionstomostof thegeneralgoalsof arti wouldrequire ndingways ofbuilding exibleandrobustvisualrepresentationsof theworld,maintainingandupdatingthem,andi nterfacingthemwithattention, otherproblemsin AI,thechallengeof visioncanbe described in termsofbuildingasignal-to-symbol itselfonlyas physicalsignalsonsensorysurfaces(such as videocamera,retina, ),which explicitlyexpressverylittleof theinformationrequiredforintelligent understandingof theenvironment. Thesesignalsmustbe convertedultimatelyinto symbolicrepresentationswhosemanipulation allowsthema-chineor organismto such an e ortlessandimmediatefaculty forhumansandotheranimals,it hasprovenexceedinglydi cultto :1.

Animageis a two-dimensionalopticalprojection,butthew orldwe wishto make senseof visuallyis thisrespect,visionis\inverseoptics:"we needto invertthe3D !2 Dprojectionin ordertorecover worldproperties(objectpropertiesin space);butthe2D !3 Dinversionof such a projectionis, strictly, anotherrespect,visionis\inversegraphics: "graphicsbeginswitha 3 Dworlddescription(intermsof objectandilluminant properties,viewpoint,etc.),and\merely"co mputestheresulting2 Dimage,withitsoccludedsurfaces,shadingan dshadows,gradients,perspective, thisprocess!A classicalandcentralproblemin computervisionis ortlessly, rapidly, reliably, andunconsciously.(We don'teven know quitehow we doit; like so many tasksforwhich ourneuralresourcesareso formidable,we have little\cognitive penetrance"or understandingof how we actuallyperformfacerecognition.)

Considerthesethreefacialimages(fromPawan Sinha,MIT,2002):Which two picturesshow thesameperson?Mostalgorithmsforcomputerv isionselect1 and2 as thesameperson,sincethoseimagesaremoresim ilarthan1 Veryfewvisualtaskscanbe successfullyperformedin a purelydata-drivenway (\bottom-up"imageanalysis).Considerthene xtimageexample:thefoxesarewellcamou agedby theirtexturedbackgrounds;thefoxesocclude each other;theyappearin severaldi erent posesandperspectiveangles; cantherepossiblyexistmathematicaloperato rsforsuchanimagethatcan: performthe gure-groundsegmentationof thescene(into itsobjectsandbackground) inferthe3 Darrangements of objectsfromtheirmutualocclusions infersurfaceproperties(texture,colour)fr omthe2 Dimagestatistics infervolumetricobjectpropertiesfromtheir 2 Dimageprojections anddoallof thisin \realtime?

"(Thismattersquitea lotin thenaturalworld\redin toothandclaw,"sincesurvival dependsonit.)5 Considernow theactualimagedataof a face,shownasa pixelarraywithluminanceplottedas a functionof (X,Y) thisimage,or evensegment thefacefromitsbackground,letalonerecogni zetheface?Inthisform,theimagerevealsboth thecomplexity of theproblemandthepoverty of \counselof despair"canbe givena moreformalstatement:Mostof theproblemswe needto solve in visionareill-posed,in Hadamard'ssensethatawell-posedproblemmus thave thefollowingsetof properties: itssolutionexists; itssolutionis unique; , fewof thetaskswe needto solve in visionarewell-posedproblemsinHadamard' : inferingdepthpropertiesfromanimage inferingsurfacepropertiesfromimageproper ties inferingcoloursin anilluminant-invariant manner inferingstructurefrommotion,shading,text ure,shadows.

6 inferinga 3 Dshape unambiguouslyfroma 2 Dlinedrawing: interpretingthemutualocclusionsof objects,andstereodisparity recognizinga 3 Dobjectregardlessof itsrotationsaboutitsthreeaxesinspace( chairseenfrommany di erent angles) understandinganobjectthathasnever beenseenbefore: ,pixelarrays,CCDcameras, CCDvideocameracontainsa densearray of independent sensors,whichconvertincident photonsfocusedby thelensonto each point into a chargeproportionalto thelight \coupled"(henceCCD)capacitivelyto allow a voltage(V=Q/C)to be readoutin a sequencescanningthearray. Thenumber of pixels(pictureelements)rangesfromafew100 ,000to many millions( MegaPixel)in animagingarray thatisabout1 cm2in size,so each pixelsensingelement is onlyabout3 uxinto such smallcatchment areasis a factorlimitingfurtherincreasesin resolutionby micronsis only6 timeslargerthanthewavelengthof a photonoflight in thevisiblespectrum(yellow 500nanometersor nm).

Spatialresolutionof theimageis thus determinedbothby thedensity of el-ements in theCCDarray, andby thepropertiesof thelenswhich is (thenumber of distinguishablegreylevels)is determinedby thenumber of bitsper pixelresolved by thedigitizer,andbytheinherent signal-to-noiseratioof (conceptuallyif notliterally)fromthreeseparateCCDarrays precededby di erent colour lters,or mutuallyembeddedas sub-populationswithina singleCCDarray. Inthecaseof composite(analog)video,colouris encodedeitheras a high-frequency\chrominanceburst"(tobe separatelydemodulatedanddecoded);or elseputona separatechannel(\luma"and\chroma"portion sof anS signal);or elseprovidedas threesep-arateRGBcolourchannels(red,gree n,blue).Colourinformationrequiresmuch lessresolutionthanluminance, framegrabberor a strobedsamplingblock in a digitalcameracontainsahigh-speedanalogue -to-digitalconverterwhich discretizesthisvideosignalintoa (NorthAmericanstandard):640 480pixels,at 30 frames/second(actuallythereis aninterlaceof alternatelinesscannedoutat 60 \ elds"per second);andPAL(European,UKstandard):768 576pixels,at 25 vast ood of datais a videostream:768 576pixels/frame 25 frames/sec= 11 pixelmay be resolved to 8 bitsineach of thethreecolourplanes,hence24 11million= 264 canwe possiblycope withthisdata ux,letaloneunderstandtheobjectsandevents creatingsuch animagestream?

Computer Vision

Tags:

Information

Advertisement

Transcription of Computer Vision

Computer Vision

Tags:

Information

Advertisement

Documents from same domain