Deep Learning with Coherent Nanophotonic Circuits - arXiv

deep Learning with Coherent NanophotonicCircuitsYichen Shen1 , Nicholas C. Harris1 , Scott Skirlo1, Mihika Prabhu1, Tom Baehr-Jones2, MichaelHochberg2, Xin Sun3, Shijie Zhao4, Hugo Larochelle5, Dirk Englund1, and Marin Solja i 11 Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA2 Coriant Advanced Technology, 171 Madison Avenue, Suite 1100, New York, NY 10016, USA3 Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA4 Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA5 Twitter Inc., 141 Portland St, Cambridge, MA 02139, USA?These authors contributed equally to this Neural Networks are computational network models inspired by signal processing in the models have dramatically improved the performance of many Learning tasks, including speech andobject recognition.

However, today s computing hardware is inefficient at implementing neural networks,in large part because much of it was designed for von Neumann computing schemes. Significant efforthas been made to develop electronic architectures tuned to implement artificial neural networks thatimprove upon both computational speed and energy efficiency. Here, we propose a new architecture fora fully-optical neural network that, using unique advantages of optics, promises a computational speedenhancement of at least two orders of magnitude over the state-of-the-art and three orders of magnitudein power efficiency for conventional Learning tasks. We experimentally demonstrate essential parts of ourarchitecture using a programmable Nanophotonic computers based on the von Neumann archi-tecture are far more power-hungry and less effective thantheir biological counterparts central nervous systems for a wide range of tasks including perception, communi-cation, Learning , and decision making.

with the increasingdata volume associated with processingbig data, developingcomputers that learn, combine, and analyze vast amountsof information quickly and efficiently is becoming increas-ingly important. For example, speech recognition software( , Apple s Siri) is typically executed in the cloud sincethese computations are too taxing for mobile hardware; real-time image processing is an even more demanding task [1].To address the shortcomings of von Neumann computingarchitectures for neural networks, much recent work hasfocused on increasing artificial neural network computingspeed and power efficiency by developing electronic architec-tures (such as ASIC and FPGA chips) specifically tailored toa task [2 5]. Recent demonstrations of electronic neuromor-phic hardware architectures have reported improved compu-tational performance [6]. Hybrid optical-electronic systemsthat implement spike processing [7 9] and reservoir comput-ing [10, 11] have also been investigated recently.

However,the computational speed and power efficiency achieved withthese hardware architectures are still limited by electronicclock rates and ohmic neural networks offer a promising alternativeapproach to microelectronic and hybrid optical-electronicimplementations. Linear transformations (and certain non-linear transformations) can be performed at the speed oflight and detected at rates exceeding 100 GHz [12] in pho-tonic networks, and in some cases, with minimal power con-sumption [13]. For example, it is well known that a com-mon lens performs Fourier transform without any powerconsumption, and that certain matrix operations can alsobe performed optically without consuming power. How-ever, implementing such transformations with bulk opticalcomponents (such as fibers and lenses) has been a majorbarrier because of the need for phase stability and largeneuron counts. Integrated photonics solves this problem byproviding a scalable solution to large, phase-stable opticaltransformations [14].

Here, we experimentally demonstrate on-chip, Coherent ,optical neuromorphic computing on a vowel recognitiondataset. We achieve a level of accuracy comparable to aconventional digital computer using a fully connected neu-ral network algorithm. We show that, under certain con-ditions, the optical neural network architecture can be atleast two orders of magnitude faster for forward propaga-tion while providing linear scaling of neuron number versuspower consumption. This feature is enabled largely by thefact that photonics can perform matrix multiplications, amajor part of nerual network algorithms, with extreme en-ergy efficiency. While implementing scalable von Neumannoptical computers has proven challenging, artificial neuralnetworks implemented in optics can leverage inherent prop-erties, such their weak requirements on nonlinearities, toenable a practical, all-optical computing application.

An op-tical neural network architecture can be substantially moreenergy efficient than conventional artificial neural networksimplemented on current electronic NEURAL NETWORK DEVICE ARCHITECTUREAn artificial neural network (ANN) [15] consists of a set ofinput artificial neurons (represented as circles in Fig. 1(a)) [ ] 7 Oct 20162X1X2X3X4h1(1)h2(1)h3(1)h4(1)Z(1)=W0 Xh1(i)h2(i)h3(i)h4(i)h1(n)h2(n)h3(n)h4(n )Y1Y2Y3Y4h(i)=f(Z(i))Y=Wnh(n)Input LayerHidden LayersOutput LayerLayer iLayer nInput OpticalSignalOutputResultLayer 1 XYabcxinxoutWaveguide V U 0000hOptical Interference UnitOptical Nonlinearity UnitFIG. Architecture of Optical Neural Networka. General artificial neural network architecture composed of an inputlayer, a number of hidden layers, and an output layer. b. Decomposition of the general neural network into individual layers. interference and nonlinearity units that compose each layer of the artificial neural to at least one hidden layer and an output each layer (depicted in Fig.)

1(b)), information propa-gates by linear combination ( matrix multiplication) fol-lowed by the application of a nonlinear activation can be trained by feeding training data into the inputlayer and then computing the output by forward propaga-tion; weighting parameters in each matrix are subsequentlyoptimized using back propagation [16].The Optical Neural Network (ONN) architecture is de-picted in Fig. 1 (b,c). As shown in Fig. 1(c), signals areencoded in the amplitude of optical pulses propagating inintegrated photonic waveguides where they pass through anoptical interference unit (OIU) and finally an optical non-linearity unit (ONU). Optical matrix multiplication is im-plemented with an OIU and nonlinear activation is realizedwith an realize an OIU that can implement any real-valuedmatrix, we use the singular value decomposition (SVD) [17]since a general, real-valued matrix (M) may be decomposedasM=U V , whereUis anm munitary matrix, isam ndiagonal matrix with non-negative real numbers onthe diagonal, andV is the complex conjugate of then nunitary matrixV.

It was theoretically shown that any uni-tary transformationsU,V can be implemented with opticalbeamsplitters and phase shifters [18, 19]. Matrix multipli-cation implemented in this manner consumes, in principle,no power. The fact that a major part of ANN calculationsinvolves matrix products enables the extreme energy effi-ciency of the ONN architecture presented here. Finally, can be implemented using optical attenuators; optical am-plification materials such as semiconductors or dyes couldalso be used [20].The ONU can be implemented using optical nonlinearitiessuch as saturable absorption [21 23] and bistability [24 28]that have been demonstrated seperately in photonic an input intensityIin, the optical output intensity is thusgiven by a nonlinear functionIout=f(Iin)[29].EXPERIMENTFor an experimental demonstration of our ONN archi-tecture, we implement a two layer, fully connected neuralnetwork with the OIU shown in Fig.

2 and use it to per-form vowel recognition. To prepare the training and testingdataset, we use 360 datapoints that each consist of four logarea ratio coefficients [30] of one phoneme. The log area ra-tio coefficients, or feature vectors, represent the power con-tained in different logarithmically-spaced frequency bandsand are derived by computing the Fourier transform of the3 FIG. of Optical Interference Unita. Optical micrograph of an experimentally fabricated 22-mode on-chip opticalinterference unit; the physical region where the optical neural network program exists is highlighted in grey. The system actsas an optical field-programmable gate array a test bed for optical experiments. b. Schematic illustration of the optical neuralnetwork program demonstrated here which realizes both matrix multiplication and amplification fully optically. c. Schematicillustration of a single phase shifter in the Mach-Zehnder Interferometer (MZI) and the transmission curve for tuning the internalphase shifter of the signal multiplied by a Hamming window function.

The360 datapoints were generated by 90 different people speak-ing 4 different vowel phonemes [31]. We use half of thesedatapoints for training and the remaining half to test theperformance of the trained ONN. We train the matrix pa-rameters used in the ONN with the standard back propaga-tion algorithm using stochastic gradient descent method [1],on a conventional computer. Further details on the datasetand backpropagation procedure are included in Supplemen-tal Information Section Coherent ONN is realized with a programmablenanophotonic processor [14] composed of an array of 56 Mach-Zehnder interferometers (MZIs) and 213 phase shift-ing elements, as shown in Fig. 2. Each interferometeris composed of two evanescent-mode waveguide couplerssandwiching an internal thermo-optic phase shifter [32] tocontrol the splitting ratio of the output modes, followedby a second modulator to control the relative phase of theoutput modes.

Deep Learning with Coherent Nanophotonic Circuits - arXiv

Tags:

Information

Transcription of Deep Learning with Coherent Nanophotonic Circuits - arXiv

Related search queries

Deep Learning with Coherent Nanophotonic Circuits - arXiv

Tags:

Information

Documents from same domain

Related documents

Related search queries