Example: stock market

On Neural Di erential Equations

On Neural Differential EquationsPatrick KidgerMathematical InstituteUniversity of OxfordA thesis submitted for the degree ofDoctor of PhilosophyTrinity 2021 [ ] 4 Feb 2022 AbstractThe conjoining of dynamical systems and deep learning has become atopic of great interest. In particular, Neural differential Equations (NDEs)demonstrate that Neural networks and differential equation are two sidesof the same coin. Traditional parameterised differential Equations are aspecial case. Many popular Neural network architectures, such as residualnetworks and recurrent networks, are are suitable for tackling generative problems, dynamical systems,and time series (particularly in physics, finance, .. ) and are thus ofinterest to both modern machine learning and traditional mathematicalmodelling. NDEs offer high-capacity function approximation, strong pri-ors on model space, the ability to handle irregular data, memory efficiency,and a wealth of available theory on both doctoral thesis provides an in-depth survey of the include: neuralordinarydifferential Equations ( for hybridneural/mechanistic modelling of physical systems); neuralcontrolleddif-ferential Equations ( for learning functions of irregular time seri)

demonstrate that neural networks and di erential equation are two sides of the same coin. Traditional parameterised di erential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems,

Tags:

  Network, Neural network, Neural, Recurrent, Recurrent network

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of On Neural Di erential Equations

1 On Neural Differential EquationsPatrick KidgerMathematical InstituteUniversity of OxfordA thesis submitted for the degree ofDoctor of PhilosophyTrinity 2021 [ ] 4 Feb 2022 AbstractThe conjoining of dynamical systems and deep learning has become atopic of great interest. In particular, Neural differential Equations (NDEs)demonstrate that Neural networks and differential equation are two sidesof the same coin. Traditional parameterised differential Equations are aspecial case. Many popular Neural network architectures, such as residualnetworks and recurrent networks, are are suitable for tackling generative problems, dynamical systems,and time series (particularly in physics, finance, .. ) and are thus ofinterest to both modern machine learning and traditional mathematicalmodelling. NDEs offer high-capacity function approximation, strong pri-ors on model space, the ability to handle irregular data, memory efficiency,and a wealth of available theory on both doctoral thesis provides an in-depth survey of the include: neuralordinarydifferential Equations ( for hybridneural/mechanistic modelling of physical systems); neuralcontrolleddif-ferential Equations ( for learning functions of irregular time series);and neuralstochasticdifferential Equations ( to produce generativemodels capable of representing complex stochastic dynamics, or samplingfrom complex high-dimensional distributions).

2 Further topics include: numerical methods for NDEs ( reversibledifferential Equations solvers, backpropagation through differential equa-tions, Brownian reconstruction); symbolic regression for dynamical sys-tems ( via regularised evolution); and deep implicit models ( deepequilibrium models, differentiable optimisation).We anticipate this thesis will be of interest to anyone interested in themarriage of deep learning with dynamical systems, and hope it will providea useful reference for the current state of the ..iiiContents ..ivOriginality ..xAcknowledgements .. xiii1 Motivation .. started .. is a Neural differential equation anyway? .. familiar example .. Neural networks .. important distinction .. The case for Neural differential Equations .. A note on history ..212 Neural Ordinary Differential Introduction.

3 And uniqueness .. and training .. Applications .. classification .. modelling with inductive biases .. normalising flows .. ODEs .. networks .. Choice of parameterisation .. architectures .. Approximation properties .. Unaugmented Neural ODEs are not universal Augmented Neural ODEs are universal approximators, even iftheir vector fields are not universal approximators .. Comments ..473 Neural Controlled Differential Introduction .. differential Equations .. vector fields .. CDEs .. to regular time series .. Applications .. time series .. are discretised Neural CDEs .. time series and rough differential Equations .. Neural SDEs .. Theoretical properties .. approximation .. to alternative ODE models .. Choice of parameterisation .. architectures and gating procedures.

4 Field interactions .. Neural CDEs .. Interpolation schemes .. conditions .. of interpolation points .. interpolation schemes .. Comments ..724 Neural Stochastic Differential Introduction .. differential Equations .. and recurrent structure .. Construction .. Training criteria .. SDEs .. and combinations .. Choice of parameterisation .. of optimiser .. of architecture .. regularisation .. Examples .. Comments ..925 Numerical Solutions of Neural Differential Backpropagation through ODES .. ODE solvers .. sensitivity .. Backpropagation through CDEs and SDEs .. for CDEs .. for SDEs .. differential equation solvers .. Numerical solvers .. numerical solvers .. solvers .. vector fields with jumps .. Tips and tricks.

5 The structure of adaptive step size controllers .. Numerical simulation of Brownian motion .. Path .. Brownian Tree .. Interval .. Software .. Comments .. 1306 Symbolic regression .. to symbolic regression .. regression for dynamical systems .. Limitations of Neural differential Equations .. requirements .. discretised architectures .. Beyond Neural differential Equations : deep implicit layers .. differential Equations as implicit layers .. equilibrium models .. shooting: DEQs meet NODEs .. optimisation .. Comments .. 140vii7 Future directions .. Thank you .. 142A Review of Deep Autodifferentiation .. Normalising flows .. Universal approximation .. Irregular time series.

6 Miscellanea .. 148B Neural Rough Differential Background .. Signatures and logsignatures .. The log-ODE method .. Neural vector fields .. Applying the log-ODE method .. Discussion .. Efficacy on long time series .. Limitations .. Examples .. Comments .. 159C Proofs and Augmented Neural ODEs are universal approximators even when theirvector fields are not universal approximators .. Comments .. Theoretical properties of Neural CDEs .. Neural CDEs are universal approximators .. Neural CDEs compared to alternative ODE models .. Reparameterisation invariance of CDEs .. Comments .. Backpropagation via optimise-then-discretise .. Optimise-then-discretise for ODEs .. Optimise-then-discretise for CDEs.

7 Optimise-then-discretise for SDEs .. Comments .. Convergence and stability of the reversible Heun method .. Convergence .. Stability .. Brownian Interval .. Algorithmic definitions .. Discussion .. 190D Experimental Continuous normalising flows on images .. Latent ODEs on decaying oscillators .. Neural CDEs on spirals .. Neural SDEs on time series .. Brownian motion .. Time-dependent Ornstein Uhlenbeck process .. Damped harmonic oscillator .. Lorenz attractor .. Symbolic regression on a nonlinear oscillator .. Neural RDEs on BIDMC .. 203 Bibliography205 Notation225 Abbreviations227 Index229ixOriginalityStatementThe writing of this thesis is my original work. The material in this thesis is either (a)my original work either with or without collaborators, or (b) where relevant prior orconcurrent work included for reference, so as to provide a survey of the thesis contains material from the following papers on Neural differential Equations (organised chronologically): Neural Controlled Differential Equations for Irregular Time SeriesPatrick Kidger, James Morrill, James Foster, Terry LyonsNeural Information Processing Systems, 2020 Hey, that s not an ODE : Faster ODE Adjoints via SeminormsPatrick Kidger, Ricky T.

8 Q. Chen, Terry LyonsInternational Conference on Machine Learning, 2021 Neural Rough Differential Equations for Long Time SeriesJames Morrill, Cristopher Salvi, Patrick Kidger, James Foster, Terry LyonsInternational Conference on Machine Learning, 2021 Neural SDEs as Infinite-Dimensional GANsPatrick Kidger, James Foster, Xuechen Li, Harald Oberhauser, Terry LyonsInternational Conference on Machine Learning, 2021 Efficient and Accurate Gradients for Neural SDEsPatrick Kidger, James Foster, Xuechen Li, Terry LyonsNeural Information Processing Systems, 2021 Neural Controlled Differential Equations for Online Prediction TasksJames Morrill, Patrick Kidger, Lingyi Yang, Terry , 2021xOpen source softwareA substantial component of my PhD has been the democratisation of Neural differ-ential Equations via open-source software development. In particular I have authoredor otherwise had a substantial hand in developing:DiffraxOrdinary, controlled, and stochastic differential equation solvers for differential equation solvers for differential equation solvers for differential equation solvers for of contributionsMy personal contributions to each paper break down as the Neural Controlled Differential Equations for Irregular Time Series did the entirety of this paper.

9 James Morrill and James Foster had concurrentlyworked on similar ideas and were included as authors on the paper as a the Hey, that s not an ODE : Faster ODE Adjoints via Seminorms paper. Ihad the idea, theory, wrote the library implementation, and handled the Neural CDEand Hamiltonian experiments. Ricky T. Q. Chen performed the experiments for thecontinuous normalising flows. The written text was joint work between both of us.(And whilst of course it does not appear in the final paper, Ricky T. Q. Chen handledmost of the rebuttal.)For the Neural Rough Differential Equations for Long Time Series paper. CristopherSalvi had the idea of using the log-ODE method to reduce a Neural CDE to an spotted the practical application to long time series. James Morrill implemented Foster helped with the theory. The written text was joint work between meand James the Neural SDEs as Infinite-Dimensional GANs paper.

10 I had the basic idea,basic theory, and and wrote all of the experimental code. James Foster providedthe necessary knowledge of SDE numerics. Xuechen Li had already started writing(and released an early version of) the torchsde software library we used. XuechenxiLi and I jointly performed subsequent development of the torchsde library to extendit for this paper. The more complete idea for the paper was fleshed out jointly inconversations between all three of us. The written text was joint work between allthree of us. (Finally, I owe James Foster a debt of thanks: during the development ofthis paper, he kindly fielded endless questions from me on the topic of SDE numerics.)For the Efficient and Accurate Gradients for Neural SDEs paper. I had the ideaand the theory for the Brownian Interval. I had the idea and the theory for gradient-penalty-free training of SDE-GANs.


Related search queries