Matrix Calculus: Derivation and Simple Application

Matrix Calculus: Derivation and Simple ApplicationHU, Pili March 30, 2012 AbstractMatrix Calculus[3] is a very useful tool in many engineering prob-lems. Basic rules of Matrix calculus are nothing more than ordinarycalculus rules covered in undergraduate courses. However, using ma-trix calculus, the Derivation process is more compact. This documentis adapted from the notes of a course the author recently attends. Itbuilds Matrix calculus from scratch. Only prerequisites are basic cal-culus notions and linear algebra get a quick executive guide, please refer to the cheat sheet insection(4).To see how Matrix calculus simplify the process of Derivation , pleaserefer to the Application in section( ).

Hupili [at] ie [dot] cuhk [dot] edu [dot] hk Last compile:April 24, 20121HU, PiliMatrix CalculusContents1 Introductory Example32 Organization of Elements .. Deal with Inner Product .. Properties of Trace .. Deal with Generalized Inner Product .. Define Matrix Differential .. Matrix Differential Properties .. Schema of Hanlding Scalar Function .. Determinant .. Vector Function and Vector Variable .. Vector Function Differential .. Chain Rule .. 153 The 2nd Induced Norm of Matrix .. General Multivaraite Gaussian Distribution.

Maximum Likelihood Estimation of Gaussian .. Least Square Error Inference: a Comparison .. 214 Cheat Definition .. Schema for Scalar Function .. Schema for Vector Function .. Properties .. Frequently Used Formula .. Chain Rule .. 27 Acknowledgements28 References28 Appendix292HU, PiliMatrix Calculus1 Introductory ExampleWe start with an one variable linear function:f(x) =ax(1)To be coherent, we abuse the partial derivative notation: f x=a(2)Extending this function to be multivariate, we have:f(x) = iaixi=aTx(3)Wherea= [a1,a2.]

,an]Tandx= [x1,x2,..,xn]T. We first computepartial derivatives directly: f xk= ( iaixi) xk=ak(4)for allk= 1,2,..,n. Then we organizenpartial derivatives in the followingway: f x= f x1 f f xn = =a(5)The first equality is by proper definition and the rest roots from ordinarycalculus (5) is analogous to eqn(2), except the variable changes from a scalarto a vector. Thus we want to directly claim the result of eqn(5) withoutthose intermediate steps solving for partial derivatives separately. Actually,we ll see soon that eqn(5) plays a core role in Matrix sections are organized as follows: Section(2) builds commonly used Matrix calculus rules from ordinarycalculus and linear algebra .

Necessary and important properties of lin-ear algebra is also proved along the way. This section is not organizedafterhand. All results are proved when we need them. Section(3) shows some applications using Matrix calculus. Table(1)shows the relation between Section(2) and Section(3). Section(4) concludes a cheat sheet of Matrix calculus. Note that thischeat sheet may be different from others. Users need to figure outsome basic definitions before applying the , PiliMatrix CalculusTable 1: Derivation and Application , , Organization of ElementsFrom the introductary example, we already see that Matrix calculus doesnot distinguish from ordinary calculus by fundamental rules.

However, withbetter organization of elements and proving useful properties, we can sim-plify the Derivation process in real author would like to adopt the following definition:Definition a scalar valued functionf(x), the result f xhas the samesize withx. That is f x= f x11 f f x1n f x21 f f f xm1 f f xmn (6)In eqn(2),xis a 1-by-1 Matrix and the result f x=ais also a 1-by-1matrix. In eqn(5),xis a column vector(known as n-by-1 Matrix ) and theresult f x=ahas the same this definition, we have: f xT= ( f x)T=aT(7)Note that we only use the organization definition in this example. Later we llshow that with some Matrix properties, this formula can be derived withoutusing f xas a Deal with Inner ProductTheorem there s a multivariate scalar functionf(x) =aTx, we have f x= , PiliMatrix introductary scalar, we can write it equivalently as the trace of its ,Proposition there s a multivariate scalar functionf(x) = Tr[aTx],we have f x= [ ] is the operator to sum up diagonal elements of a Matrix .

In thenext section, we ll explore more properties of trace. As long as we cantransform our target function into the form of theorem(1) or proposition(2),the result can be written out directly. Notice in proposition(2),aandxareboth vectors. We ll show later as long as their sizes agree, it holds for Properties of TraceDefinition of square Matrix is defined as:Tr [A] = iAiiExample definition(1,2), it is very easy to show: Tr [A] A=I(8)since only diagonal elements are kept by the trace trace has the following properties: (1)Tr [A+B] = Tr [A] + Tr [B] (2)Tr [cA] =cTr [A] (3)Tr [AB] = Tr [BA] (4)Tr [ ] = Tr [ 1] (5)Tr[ATB]= i jAijBij (6)Tr [A] = Tr[AT]whereA,Bare matrices with proper sizes, andcis a scalar wikipedia [5] for the we explain the intuitions behind each property to make it eas-ier to remenber.

Property(1) and property(2) shows the linearity of (3) means two matrices multiplication inside a the trace operatoris commutative. Note that the Matrix multiplication without trace is notcommutative and the commutative property inside the trace does not hold5HU, PiliMatrix Calculusfor more than 2 matrices. Property (4) is the proposition of property (3) 1as a whole. It is known as cyclic property, so thatyou can rotate the matrices inside a trace operator. Property (5) shows away to express the sum of element by element product using Matrix productand trace. Note that inner product of two vectors is also the sum of ele-ment by element product.

Property (5) resembles the vector inner productby form(ATB). The author regards property (5) as the extension of innerproduct to matrices(Generalized Inner Product). Deal with Generalized Inner ProductTheorem there s a multivariate scalar functionf(x) = Tr[ATx], wehave f x=A. (A,xcan be matrices). property (5) of trace, we can writefas:f(x) = Tr[ATx]= ijAijxij(9)It s easy to show: f xij= ( ijAijxij) xij=Aij(10)Organize elements using definition(1), it is this theorem and properties of trace we revisit example(1).Example vectora,xand functionf(x) =aTx f xT(11)= (aTx) xT(12)(fis scalar)= (Tr[aTx]) xT(13)(property(3))= (Tr[xaT]) xT(14)(property(6))= (Tr[axT]) xT(15)(property of transpose)= (Tr[(aT)TxT]) xT(16)(theorem(4))=aT(17)The result is the same with example(1)

, where we used the basic above example actually demonstrates the usual way of handling amatrix derivative , PiliMatrix Define Matrix DifferentialAlthough we want Matrix derivative at most time, it turns out Matrix differ-ential is easier to operate due to the form invariance property of differential inherit this property as a natural consequence of the fol-lowing Matrix differential:dA= (18)Theorem operator is distributive through trace operator:dTr [A] = Tr [dA] = d( iAii) = idAii(19)RHS = Tr (20)= idAii= LHS(21)Now that Matrix differential is well defined, we want to relate it backto Matrix derivative.

Matrix Calculus: Derivation and Simple Application

Tags:

Information

Transcription of Matrix Calculus: Derivation and Simple Application

Related search queries

Matrix Calculus: Derivation and Simple Application

Tags:

Information

Related documents

Related search queries