Example: air traffic controller

OCR of MathExpressions - BGU

BenGurionUniversityOCRofMathExpressionsO pticalCharacterRecognitionSystemofMathEx pressions&TestingFrameworkYakirDahan VitaliSep etnitsky WorkReportFebruary17,2011 Abstract:Inthispro jectwepresentanapplicationforp erformingOCRofmathematicalex-pressions,b asedonsegmentationandcorresp ondence nding,basedoncorrelationco e StudentatDepartmentofSoftwareEngineering ,Ben-GurionUniversity,Beer-Sheba,Israel StudentatDepartmentofSoftwareEngineering ,Ben-GurionUniversity,Beer-Sheba,Israel1 ContentsIPro jectWork owDescription31 Intro Imp , (Foracompleteuser-manualseepart2).. ImageLoading CameraCapture jectWork owDescription1 Intro eenlo okingonaprintedimageofamathematicalexpre ssion,orevenonanex-pressionyoutyp edusingaformulaeditor(liketheoneofMicros oftWord),andwereinterestedtoknowitsde ,youdon'twantordon'thavethetimetop erformtheexhaustingpro ,thispro cessrequiresyoutoswitchb etweentheengine'swindowandtheimagewhichi saveryuncomfortablepro ort,timeandmayb enerves:)wouldb esavedifyoucouldtakeyourmobilephone,capt uretheimageandgotautomaticallyitsanalysi sp (andyoumustagreeifyoueverlearnedacalculu scourse:-))thanyougotintotherightplace!

system is to classify optical patterns, contained in the image, to corresponding alphanumeric or other characters. After performing OCR, a further processing can be applied to the text, such

Tags:

  System, Ocr of mathexpressions, Mathexpressions

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of OCR of MathExpressions - BGU

1 BenGurionUniversityOCRofMathExpressionsO pticalCharacterRecognitionSystemofMathEx pressions&TestingFrameworkYakirDahan VitaliSep etnitsky WorkReportFebruary17,2011 Abstract:Inthispro jectwepresentanapplicationforp erformingOCRofmathematicalex-pressions,b asedonsegmentationandcorresp ondence nding,basedoncorrelationco e StudentatDepartmentofSoftwareEngineering ,Ben-GurionUniversity,Beer-Sheba,Israel StudentatDepartmentofSoftwareEngineering ,Ben-GurionUniversity,Beer-Sheba,Israel1 ContentsIPro jectWork owDescription31 Intro Imp , (Foracompleteuser-manualseepart2).. ImageLoading CameraCapture jectWork owDescription1 Intro eenlo okingonaprintedimageofamathematicalexpre ssion,orevenonanex-pressionyoutyp edusingaformulaeditor(liketheoneofMicros oftWord),andwereinterestedtoknowitsde ,youdon'twantordon'thavethetimetop erformtheexhaustingpro ,thispro cessrequiresyoutoswitchb etweentheengine'swindowandtheimagewhichi saveryuncomfortablepro ort,timeandmayb enerves:)wouldb esavedifyoucouldtakeyourmobilephone,capt uretheimageandgotautomaticallyitsanalysi sp (andyoumustagreeifyoueverlearnedacalculu scourse:-))thanyougotintotherightplace!

2 Themotivationofthispro jectistomakethe rststepstowardsthewonderfulapplicationde scrib edab jectwetakeamathematicalexpressiongivenas animage,p erformanOCRandfurtheranalysispro cess, jectcombinesacomputervisionapproach,ofp erformingOCRofsinglecharactersandsymb ols,andsomekindofarti cialintelligenceapproach,whichisapplieda fterthebasicOCRpro , ,wep erformOCRbyusingthesameapproachusedtosol vethecorresp ondenceproblemwhenp erformingstereopsis-bymaximizingacorrela tionsimilaritymeasurecomputedusingasetof pre-de oseherecanb eusablefor: Peopleworkingwithmathematicalexpressions ,esp eciallystudents. Findingerrorsandinconsistenciesinatyp edmathematicalexpression. Rewritingandredesigningthetyp edexpression. Transferringtheexpressionb etweendi erentapplicationsandtext erformtranslationofscanned/capturedimage softext(whichcanb ehandwrittenortyp ewritten)toastandard,machine-enco oseofanOCRsystemistoclassifyopticalpatte rns,containedintheimage,tocorresp erformingOCR,afurtherpro cessingcanb eappliedtothetext,suchastext-to-sp eldofresearchincomputervisionandarti Imp ortantdates1929A rstpatentonOCRwas rstobtainedinGermanybyGustavTauschek,for amechanicaldevicewhichp erformedOCRusingaphoto rstcommercialOCRapplicationwasinventedin USA, , orationwasfoundedandnamedIntelligentMach inesResearch(IMR).

3 Itdeliveredmachineswhichconvertedprinted textswhichwereesp eciallycreditcardnumb ersandteletyp etyp ewrittenmessages, rstdevelopmentofanomni-fontOCRsystem (ISRI)conductedtheAnnualTestofOCRA ccuracyinordertoencouragethedevelopmento fOCRsystemsforunderstandingmachineprinte ddo day, erpagesconcludedthatcharacter-by-charact erOCRaccuracyforcommercialOCRsoftwarevar yfrom71%to98%.OtherareasinOCR-includingr ecognitionofhandprinting,cursivehandwrit ing,andprintedtextinscriptsotherthanLati n arestillasub netheproblemformallywedecidedab outaninitialsetofcharactersandmathe-mati calsymb ( A , .., Z )andsmallEnglishletters( a , .., z ). : , , , , , , , and . ( 0 , .., 9 ). erators: + , , , , and / . oleanop erators: < and > . :leftparenthesis( ( )andrightparenthesis( ) ). otcharacter( ). ( = ) ( )andpro duct( )symb ( )andderivative( )symb nitysymb ol( ).Asitwassaidab ove,thegoalofthispro jectwastocombinecomputervisionapproachwh ichallowustop erformOCRofsinglesymb ,thusleadingustoapplicationbasedonsomeki ndofarti ,containsonlycharactersfromthesetgivenab ove, ,thefollowingexpressionsaren'tacceptable :+6,e andk.

4 Thesecondassumptionisusefulesp eciallyfortheop ,weassumethattheintensityofthecharacters seenontheimageisclearlydi eseenfromthenextsection,ouralgorithmcand ealwithnoisyimages,butitislimitedtoimage sinwhichthecharacterscaneasilyb ,wep erformasimpleimagepro cessingoftheimageusingthefollowingop erations:( a) 'scharactersontheimageshouldb estrictlylowerthantheintensityofthebackg roundandtherefore,convertingtotheimageto grayscale,shouldn'tchangedrasticallythen umb erofexpression'spixels.( b)Convertingthegray-scaledimagetoblack&w hite,byusingOtsu'smetho ,weshouldgetawhitebackgroundwithcharacte rsandmathematicalsymb olslyingonit.( c)Removingallconnectedgroupsofpixelswith lessthan9pixels-madeinordertoreducenoise s(butmaycauseproblemsinhandlingsmallfont s).( d)Croppingtheresultimagetob eofsizeoftheminimalb , outthewaywereadamathematicalexpression,c onstructedfromthesymb olsgivenab ,thenumb erofexceptionalexpressionsislimited(acco rdingtothesetofsymb olsgivenab ove)andrefersonlytothesecondrule:( a)Theexpression10 k=1xisreadas sum(x, k= 1to10).

5 Similarly,theexpression10 k=1xisreadas product(x, k= 1to10) .Inotherwords,theseexpressionsarereadlef t-to-rightbutb ottom-to-top(regardingthek= 1and10).( b)Theexpression x2isreadas integralfrom to ,again,theexpres-sionisreadleft-to-right butb ,weassumedthatOCRcanb ep erformedontheexpression'scharactersinthe left-to-rightandtop-to-b cessofinvertingtheorderofcharactersinthe sp ecialcases,isdoneduringtheconstructionof thefullexpressionfromthesinglecharacters readbytheOCRpro olswasdoneasdescrib olwegotfromthedivisiondescrib edab ove,wehastop erformthepro cessof ndingthemostcorresp ondinginterpretationoftheimage,we rstre-sizeittoastandardsizewhichisthesiz eofallthetemplatessupplied( ) ovestagesa pseudo-result ,since,itisn'tthe nalresultofthealgorithm-the nalresultisretrievedafterp erformingaconversionofthepseudo-resultto amathematicalexpression,usingsyntacticcl uesofexpressionsofthistyp ,rowsandclumpsGivenabinary(black&white)i mage,let'sde nesomede ,allitsvertical,unitwidth,columns, ,thatifanotherunit-pixelscolumnoftheimag eisaddedtothisvalidcolumn(onleftorright)

6 ,itb ecomesinvalid(thenewcolumndo esn'tcontainanyblackpixels). ,allitshorizontal,unitwidth,rows, ,thatifanotherunit-pixelsrowoftheimageis addedtothisvalidrow(ontoporb ottom),itb ecomesinvalid(thenewrowdo esn'tcontainanyblackpixels).Aconnectedco mp onentisasetofpixels,inwhichanypixelhasan eighb orfromthesetinoneofthe8directions(N,NW,W ,SW,S,SE,E,NE).Aclumpisaconnectedcomp ,externaltotheclump, outagiven(black&white)imagerepresentinga mathematicalexpres-sions,ifwewanttodivid eitintosinglesymb ols,whilepreservingtheorderofsymb olsintheexpression,wecanusethefollowingw ay:Algorithm1 Divisionoftheimagetosinglesymb ,C, C:( a)Dividecintoaset,R,ofmaximalvalidrows(c istreatedasastandaloneimage).( b)Foreachvalidrowr ,S-eachclumps Sisassumedtob easinglesymb ovealgorithm,wepresentitsop erationonthemathematicalex-pressiongiven b elow:Figure1:Themathematicalexpressionon whichAlgorithm1isappliedAftercomputingth emaximalvalidcolumns,weobtainthefollowin gsetCofcolumns(thecolumnsarehighlighted) :Figure2:RetrievingcolumnsfromtheimageHe re,wepresentthemaximalvalidlinesofthetwo centralcolumnsfromC:Figure3:Retrievingli nesfromtwocentralcolumnsAsitcanclearlyb eeseen, ,wedivideeachrowintoclumps,whicharenumb eredaccordingto m, n patternwheremisthenumb eroftherow(fromtoptob ottom)andnisthenumb eroftheclump(lefttoright,toptob ottom).

7 Hereistheresultweshouldget:8 Figure4 jectistakinganimage,representingasinglec haracterormathe-maticalsymb ol,andgivingittheinterpretationofthischa racter/symb dbyestablishingcorresp edb elow:GiventwomatricesAandBofthesamesize, the2-Dcorrelationco e cientriscalculatedbythefollowingway(Aand BrepresentthemeanvaluesofAandBL accordingly):r= m n(Amn A)(Bmn B) m n(Amn A)2 m n(Bmn B)2 Thecorrelationco e cientrangesfrom= esp erfectlytherelationshipb etweenAandB,withalldatap (Amn A)(Bmn B)isp ositiveifandonlyifAmnandBmnlieonthesames ideoftheirresp e cientisp ositiveifAmnandBmntendtob esimultaneouslygreaterthan,orsimultaneou slylessthan,theirresp , ep ,whichrepresentsacharacteroramathematica lsymb ,inourcaseAandBarebinarymatrices,sincewe converttheinputimagetoblack&whiteandthet emplatesarestoredasblack& ,highvalueofacorrelationco e cientmeansthattherearealotofpixels,lying atthesameplacesinAandB,withsamevaluesand thereforeAcangettheinterpretationofB(whi chisknown).

8 Inordertousethismetho d,asetoftemplateimagesshouldb e ose,wecreatedasetoftemplateswhichissupp ortedwithourapplicationandcanb ,assumedtob :GrayScaleI imagetogray scale(I) :Black&WhiteI imagetoblack andwhite(GrayScaleI) :pseudo_result ,C,accordingtothemetho dsp eci Cdo:( a)Dividecintoasetoflines,L,accordingtoth emetho dsp eci ( b)Foreachlinel ,S,accordingtothemetho dsp eci ,s S,assumedtob easinglecharacter(orapartofasinglecharac ter): ddescrib edab oveandobtainacharacterc. eingstartorendofap ower: ower:pseudo_result pseudo_result (c. ,incasecisanendofap owerorasquarero ot:pseudo_result pseudo_result c). ol:pseudo_result pseudo_result nalmathematicalexpressionfromthepseudo-r esult,usingstatisticalcluesofmathematica lexpressions:returnComputeFinalResult(ps eudo_result).105 TheApplication(Foracompleteuser-manualse epart2)Forthepurp :TheOCRsystemandTheGUIframe- rstpartofthesystem,whichp erformstheOCRoftheexpressionbyusingthe rst3-stagesofthealgorithmdescrib edab ove, :This lecontainsonemainfunctionnamedOCR(),whic hp erformstheOCRofanimagestoredinthecurrent folderofMATLAB,byusingtheauxiliaryfunc-t ionsdescrib edb leandisusedbythesecondpartoftheimplement ation(seeb elow)inorderto erformed,thenamewritteninline11ofthescri pt,canb ,thenameofthe le,intowhichthepseudo-resultiswritten,ca nb :Becauseofthewayweimplementedthetestingf ramework-neitherthenameoftheimage- le,northenameoftheoutputtext- le,shouldb echangedinordertop erformOCRondi ()andlines()whicharestoredaccordinglyint he ,onwhichtheOCRisp erformed,tocolumnsandlinesaccordingly,as describ (),whichisstoredinthe , erformingOCRbytheread_letter()function(s eeb elow).

9 ()whichisthefunctionthatprac-ticallyp ndsthecorresp ondencebymaximizingthecorrelationco e cient,calculatedoneachofthetemplatesofth etemplatesdataset,asdescrib onsibleoftakingthepseudo-resultreturnedb yMATLAB,andp erformingits nalizationandconversiontoavalidmathemati calexpressionwhichisthensenttoWolframAlp hacompu-tationalenginebyop (whilegettingclosertotheapplicationgoals describ edab ove)wehaveimplementedaGUIbasedsysteminJa va,usingtheSwingGUIto ,wehaveusedtheJMF(JavaMediaFramework)lib rarythatenablesaudio,video11andothertime -basedmediatob ,inordertoallowourapplicationtoconnectwi thMATLAB computationengine,weusedanop ensourceJavaRMI-basedto olcalledJAMAL(JavaMatlabLinking),whichma deitp ossibletocalltheMATLAB functionsp erformingOCR-fromtheJavaapplication,with outrequiringtheusertoactuallyop ckdiagramdescribingourapplication:Figure 5:Blo ckDiagramoftheAlgorithm126 Exp erimentsandResultsInordertotesttheprop osedalgorithmwehadtode emeasuredinseveralways,andhowtheyaremea- suredcangreatlya ecttherep ,ifthestructureofmathematicalexpressions (thecontext)isnotusedtocorrecttheresulti ngexpression,acharactererrorrateof5%(95% accuracy)mayresultinanerrorrateof9%(91%a ccuracy)orworseifthemeasurementisbasedon whethereachcomp othpartsofthealgorithm-theOCRrecognizera ndthe etwotyp erofthesemistakeswasdenotedas erofthesecharacterswasdenotedas (lessthanthe rsttyp e).

10 Themeasuringoftheaccuracyratewasdoneusin gtheformula:1 100 ( + ) .Inthisformula representsthetotalnumb erofcharactersandsymb erimentswereconductedonasetof20mathemati calexpressionswithdi erentdegreesofcomplexity, (themistakesareshowninb oldred):100%accuracyFigure6:Test1,Theres ultis: (x+a)^(n)=sum(C(n,k)x^(k)a^(n-k),k=0ton) 100%accuracyFigure7:Test2,Theresultis: (1+x)^(n)=1+((nx)/(1!))+((n(n-1)x^(2))/( 2!))+ 97%accuracyFigure8:Test3,Theresultis: cosa+cosb eta=2cos((1)/(2))(alpha+b eta)cos((1)/(2))(alpha-b eta) 1394%accuracyFigure9:Test4,Theresultis: x=((-b+sqrt(b^(2)-4aC))/(2a)) 100%accuracyFigure10:Test5,Theresultis: A=P(1+((r)/(n)))^(nt) 93%accuracyFigure11:Test6,Theresultis: integrate(e^(-x2),-in nity,in nity)=(integrate(e^(-x2),-in nity,in nity)*integrate(e^(-y2),-in nity,in nity))^(1/2)=sqrt(pi) 96%accuracyFigure12:Test7,Theresultis: ((1)/(2pi))2piL0((dtheta)/(a+bsintheta)) =((1)/(!!a^(2)-b^(2))) 87%accuracyFigure13:Test8,Theresultis: a(b+C)=ab+aC 1494%accuracyFigure14:Test9,Theresultis: ((a)/(b))+((C)/(d))=((ad+b c)/(b d)) 86%accuracyFigure15:Test10,Theresultis: y!


Related search queries