Example: stock market

High-OrderContrasts for Independent Component Analysis - …

LETTERC ommunicatedbyAnthonyBellHigh-OrderContra stsforIndependentComponentAnalysisJean-F ran coisCardosoEcoleNationaleSup erieure desT el ecommunications,75634 ParisCedex13, comparetheproposedapproacheswithgradient -basedtechniquesfromthealgorithmicpointo fviewandalsoona IntroductionGivenann 1 randomvectorX, independentcomponentanalysis(ICA)consist soffindinga basisofRnonwhichthecoefficientsofXare as inde-pendentaspossible(insomeappropriate sense).Thechangeofbasiscanberepresentedb yann nmatrixBandthenewcoefficientsgivenbythee ntriesofvectorY=BX. WhentheobservationvectorXis modeledasalinearsuperpositionofsourcesig nals,matrixBis understoodasa separat-ingmatrix,andvectorY=BXis a keyissuesofICAare thedefinitionofa measure of independenceandthedesignofalgorithmstofi ndthechangeofbasis(orseparatingmatrix) describestochasticgradientalgorithmsinvo lvingasanessentialdeviceintheirlearningr ulea ,mostofthemfoundinthesignalprocessinglit erature, exploitthealgebraicstructure of high -ordermomentsof of-tenregardedasbeingunreliable,inaccura te,slowlyconvergent, matteroffact,it is largelyignoredbytheresearchersof basedonfourth-ordercorrelationsbetw

High-Order Contrasts for Independent Component Analysis 159 particular,thegradientoftheinfomax—maximum likelihood(ML) contrast yields a function H(·) in the form

Tags:

  High, Analysis, Component, Order, Independent, Contrast, High ordercontrasts for independent component analysis, Ordercontrasts, High order contrasts for independent component analysis

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of High-OrderContrasts for Independent Component Analysis - …

1 LETTERC ommunicatedbyAnthonyBellHigh-OrderContra stsforIndependentComponentAnalysisJean-F ran coisCardosoEcoleNationaleSup erieure desT el ecommunications,75634 ParisCedex13, comparetheproposedapproacheswithgradient -basedtechniquesfromthealgorithmicpointo fviewandalsoona IntroductionGivenann 1 randomvectorX, independentcomponentanalysis(ICA)consist soffindinga basisofRnonwhichthecoefficientsofXare as inde-pendentaspossible(insomeappropriate sense).Thechangeofbasiscanberepresentedb yann nmatrixBandthenewcoefficientsgivenbythee ntriesofvectorY=BX. WhentheobservationvectorXis modeledasalinearsuperpositionofsourcesig nals,matrixBis understoodasa separat-ingmatrix,andvectorY=BXis a keyissuesofICAare thedefinitionofa measure of independenceandthedesignofalgorithmstofi ndthechangeofbasis(orseparatingmatrix) describestochasticgradientalgorithmsinvo lvingasanessentialdeviceintheirlearningr ulea ,mostofthemfoundinthesignalprocessinglit erature, exploitthealgebraicstructure of high -ordermomentsof of-tenregardedasbeingunreliable,inaccura te,slowlyconvergent, matteroffact,it is largelyignoredbytheresearchersof basedonfourth-ordercorrelationsbetweenth eentriesofY.

2 As a benefit,thesealgorithmsevadesthecurseofg radientdescent:NeuralComputation11 , 157 192(1999)c 1999 MassachusettsInstituteofTechnology158 Jean-Fran coisCardosotheycanmovein ,whichare discussedinthearticleandsummarizedin a outliningthecontentof thisarticle,webrieflyreviewsomegradient- basedICAmethodsandthenotionof ,a specificclasscanbesingledout:al-gorithms basedona multiplicativeupdateofanestimateB(t)ofB. Thesealgorithmsupdatea separatingmatrixB(t)onreceptionofa newsamplex(t)accordingtothelearningruley (t)=B(t)x(t),B(t+1)=(I tH(y(t)))B(t),( )whereIdenotesthen nidentitymatrix,{ t}is a scalarsequenceofpos-itivelearningsteps,a ndH:Rn Rn nis a suchalgorithmsare characterizedbytheconditionthattheupdate haszero mean,thatis,bythecondition,EH(Y)=0.( )Theonlinescheme, ,canbe(andoftenis)implementedinanoff-lin emanner.

3 UsingTsamplesX(1),..,X(T), onegoesthroughthefollowingiterationswher e thefieldHis Sety(t)=x(t)fort=1,.., Tt=1H(y(t)). smallenough,stop;elseupdateeachdatapoint y(t)byy(t) (I H)y(t) (arbitrarily)smallvalueoftheaveragefield :itsolvestheestimatingequation,1TT t=1H(y(t))=0,( )whichis gradientalgorithms:themap-pingH( )canbeobtainedasthegradient(therelativeg radient[Cardoso&Laheld,1996]orAmari snaturalgradient[1996])of somecontrastfunction,thatis,a real-valuedmeasure of howfarthedistributionYis fromsomeidealdistribution,typicallya , thegradientof theinfomax maximumlikelihood(ML)contrastyieldsa functionH( )intheformH(y)= (y)y I,( )where (y)is ann 1 vectorofcomponent-wisenonlinearfunctions with i( )takento beminusthelogderivativeof thedensityof theicomponent(seeAmari,Cichocki,&Yang,19 96,fortheonlineversionandPham&Garat,1997 ,fora batchtechnique).

4 ,onemaydecide,asinprincipalcomponentanal ysis(PCA),torequestexactdecorrelation(se cond-orderindependence)ofthecompo-nents: matrixBshouldbesuchthatY=BXis spatiallywhite, thatis,itscovariancematrixis , mustbestressedthatcomponentsthatare as independentaspossibleaccordingtosomemeas ure of independenceare notnecessarilyuncorre-latedbecauseexacti ndependencecannotbeachievedin ,if decorrelationis desired,it mustbeenforcedexplicitly;thealgorithmsde scribedbelowoptimizeunderthewhitenesscon straintapproximationsofthemutualinformat ionandofothercontrastfunctions(possiblyd esignedtotakeadvantageofthewhitenesscons traint).Onepracticalreasonforconsidering theorthogonalapproachis thatoff-linecontrastoptimizationmaybesim plifiedbya two-stepprocedure ,a whitening(or sphering )matrixWis spatiallywhiteandoneis alsolookingfora whitevectorY, thelattercanbeobtainedonlybyanorthonor-m altransformationVof ,insucha scheme,theseparatingmatrixBis foundasa productB=VW.

5 Thisapproachleadstointerestingimplementa tionsbecausethewhiteningmatrixcanbeobtai nedstraightfor-wardlyasanymatrixsquare rootof theinversecovariancematrixofXandtheoptim izationofa contrastfunctionwithrespectto two-stageJacobi-basedprocedure; it alsoexistsas a one-stagegradientalgorithm(seealsoCardos o& Laheld,1996).Assumethattherelative/natur algradientofsomecontrastfunctionleadstoa particularfunctionH( )fortheupdaterule, , thesamecontrastfunctionwithrespecttoorth onormaltransformationsare characterizedbyEH(Y) H(Y) =0where thesuperscript ,forzero-meanvariables,thewhitenessconst raintis EYY =I, whichwecanalso160 Jean-Fran coisCardosowriteasEYY I=0. BecauseEYY Iis a symmetricmatrix matrixwhileEH(Y) H(Y) is a skew-symmetricmatrix,thewhitenessconditi onandthestationarityconditioncanbecombin edina E{YY I+H(Y) H(Y) }=0.

6 Whenitholdstrue,boththesymmetricpartandt heskew-symmetricpartcancel;theformerexpr essesthatYis white, ,if thealgorithmin ,thenthesamealgorithmoptimizesthesamecon trastfunctionunderthewhitenessconstraint withHgivenbyH(y)=yy I+ (y)y y (y) .( )It is thussimpletoimplementorthogonalversionso fgradientalgorithmsoncea regularversionis (1994)com-paresthedata-basedoptionandthe statistic-basedoptionforcomputingoff-lin eanICAofa batchx(1),..,x(T)ofTsamples;thisarticlew illalsointroducea mixedstrategy( ).In thedata-basedoption,succes-sivelineartra nsformationsare appliedto thedatasetuntilsomecriterionofindependen ceis is notnecessaryto updateexplicitlya separatingmatrixBin thisscheme(althoughonemaydecideto dosoin a particularimplementation);thedatathemsel vesare updateduntiltheaveragefield1T Tt=1H(y(t))issmallenough;thetransformBis to summarizethedatasetintoa smallersetof statisticscomputedonceandforallfromtheda taset.

7 Thealgorithmthenestimatesa separatingmatrixasa cumulant-basedalgebraictechniqueswhere thestatisticsare , theICAproblemis recastin theframeworkof (blind)identification,showinghowentropic contrastsreadilystemfromthemaximumlikeli hood(ML) , high -orderapproximationsto theentropiccontrastsare given,andtheiralgebraicstruc-ture is describesdifferentflavorsof comparisonbetweenJacobitechniquesanda gradient-basedalgorithmis giveninsection5 basedonarealdatasetofelectroencephalogra m(EEG) ContrastFunctionsandMaximumLikelihoodIde ntificationImplicitlyorexplicitly, ICAtriesto fita modelforthedistributionofXthatis a modelofindependentcomponents:X=AS, whereAis aninvertiblen nmatrixandSis ann 1 separatingmatrixB=A 1. EvenifthemodelX=ASis notexpectedto holdexactlyformanyrealdatasets,onecansti lluseit to (a moredetailedexpositioncanbefoundin Cardoso,1998).

8 BlindseparationbasedonMLwasfirstconsider edbyGaetaandLacoume(1990)(buttheauthorsu sedcumulantapproximationsasthosedescribe dinsection3),PhamandGarat(1997),andAmari et al.(1996). densityri( ).1 Then,thedistributionPSoftherandomvectorS hasa densityr( )intheformr(s)= ni=1ri(si), andthedensityofXforagivenmixtureAanda givenprobabilitydensityr( )is:p(x;A,r)=|detA| 1r(A 1x),( )sothatthe(normalized)log-likelihoodLT(A ,r)ofTindependentsamplesx(1),..,x(T)ofXi sLT(A,r)def=1TT t=1logp(x(t);A,r)=1TT t=1logr(A 1x(t)) log|detA|.( )Dependingontheassumptionsmadeaboutthede nsitiesr1,..,rn, ,thenormalizedlog-likelihoodLT(A,r), whichis a sampleaverage,convergesforlargeTtoitsens embleaveragebylawoflargenumbers:LT(A,r)= 1TT t=1logr(A 1x(t)) log|detA| T E logr(A 1x) log|detA|,( )1 Alldensitiesconsideredinthisarticleare withrespecttotheLebesguemeasure coisCardosowhichsimplemanipulations(Card oso,1997)showto beequalto H(PX) K(PY|PS).

9 Here andinthefollowing,H( )andK( | ), respectively, (PX)doesnotdependonthemodelparameters,th elimitforlargeTof LT(A,r)is,uptoa constant,equalto ML(Y)def=K(PY|PS).( )Therefore, theprincipleof MLcoincideswiththeminimizationof a specificcontrastfunction,whichis nothingbutthe(Kullback)divergenceK(PY|PS )betweenthedistributionPYoftheoutputanda ,dependingontwooptions:(1)tryingornotto estimatePSfromthedataand(2) toselectfixeddensitiesr1,..,rnforeachcom ponent,possiblyonthebasisof a fixeddistributionalassumption,andthemini mizationof ML(Y)is performedonlyoverPYviaY=BX. Thiscanberephrased:ChooseBsuchthatY=BXis ascloseaspossibleindistributiontothehypo thesizedmodeldistributionPS, alsothecontrastfunctionderivedfromtheinf omaxprinciplebyBellandSejnowski(1995).Th econnectionbetweeninfomaxandMLwasnotedin Cardoso(1997),MacKay(1996),andPearlmutte randParra(1996).

10 ,theKullbackmis-matchK(PY|PS)shouldbemin imizednotonlybyoptimizingoverBtochangeth edistributionofY=BXbutalsowithrespecttoP S. ForeachfixedB, thatis,foreachfixeddistributionPY, theresultofthisminimizationistheoretical lyverysimple:theminimumis reachedwhenPS= PY, whichdenotesthedistributionofindependent componentswitheachmarginaldistributioneq ualtothecorrespondingmarginaldistributio nofY. ThisstemsfromthepropertythatK(PY|PS)=K(P Y| PY)+K( PY|PS)( )foranydistributionPSwithindependentcomp onents(Cover& Thomas,1991).Therefore, theminimuminPSofK(PY|PS)is reachedbytakingPS= PYsincethischoiceensuresK( PY|PS)=0. Thevalueof MLat thispointthenis MI(Y)def=minPSK(PY|PS)=K(PY| PY).( ) high -OrderContrastsforIndependentCompon entAnalysis163We usetheindexMIsincethisquantityis wellknownasthemutualin-formationbetweent heentriesofY.


Related search queries