Densely Connected Convolutional Networks

Densely Connected Convolutional NetworksGao Huang Cornell Liu Tsinghua van der MaatenFacebook AI Q. WeinbergerCornell work has shown that Convolutional Networks canbe substantially deeper, more accurate, and efficient to trainif they contain shorter connections between layers close tothe input and those close to the output. In this paper, weembrace this observation and introduce the Dense Convo-lutional Network (DenseNet), which connects each layerto every other layer in a feed-forward fashion. Whereastraditional Convolutional Networks withLlayers haveLconnections one between each layer and its subsequentlayer our network hasL(L+1)2direct connections. Foreach layer, the feature-maps of all preceding layers areused as inputs, and its own feature-maps are used as inputsinto all subsequent layers. DenseNets have several com-pelling advantages: they alleviate the vanishing-gradientproblem, strengthen feature propagation, encourage fea-ture reuse, and substantially reduce the number of parame-ters.

We evaluate our proposed architecture on four highlycompetitive object recognition benchmark tasks (CIFAR-10,CIFAR-100, SVHN, and ImageNet). DenseNets obtain sig-nificant improvements over the state-of-the-art on most ofthem, whilst requiring less computation to achieve high per-formance. Code and pre-trained models are available IntroductionConvolutional neural Networks (CNNs) have becomethe dominant machine learning approach for visual objectrecognition. Although they were originally introduced over20 years ago [18], improvements in computer hardware andnetwork structure have enabled the training of truly deepCNNs only recently. The original LeNet5 [19] consisted of5 layers, VGG featured 19 [29], and only last year Highway Authors contributed equallyx0x1H1x2H2H3H4x3x4 Figure 1:A 5-layer dense block with a growth rate ofk= layer takes all preceding feature-maps as [34] and Residual Networks (ResNets) [11] havesurpassed the 100-layer CNNs become increasingly deep, a new researchproblem emerges: as information about the input or gra-dient passes through many layers, it can vanish and washout by the time it reaches the end (or beginning) of thenetwork.

Many recent publications address this or relatedproblems. ResNets [11] and Highway Networks [34] by-pass signal from one layer to the next via identity connec-tions. Stochastic depth [13] shortens ResNets by randomlydropping layers during training to allow better informationand gradient flow. FractalNets [17] repeatedly combine sev-eral parallel layer sequences with different number of con-volutional blocks to obtain a large nominal depth, whilemaintaining many short paths in the network. Althoughthese different approaches vary in network topology andtraining procedure, they all share a key characteristic: theycreate short paths from early layers to later [ ] 28 Jan 2018In this paper, we propose an architecture that distills thisinsight into a simple connectivity pattern: to ensure maxi-mum information flow between layers in the network, weconnectall layers(with matching feature-map sizes) di-rectly with each other. To preserve the feed-forward nature,each layer obtains additional inputs from all preceding lay-ers and passes on its own feature-maps to all subsequentlayers.

Figure 1 illustrates this layout schematically. Cru-cially, in contrast to ResNets, we never combine featuresthrough summation before they are passed into a layer; in-stead, we combine features by concatenating them. Hence,the`thlayer has`inputs, consisting of the feature-mapsof all preceding Convolutional blocks. Its own feature-mapsare passed on to allL `subsequent layers. This introducesL(L+1)2connections in anL-layer network, instead of justL, as in traditional architectures. Because of its dense con-nectivity pattern, we refer to our approach asDense Convo-lutional Network (DenseNet).A possibly counter-intuitive effect of this dense connec-tivity pattern is that it requiresfewerparameters than tra-ditional Convolutional Networks , as there is no need to re-learn redundant feature-maps. Traditional feed-forward ar-chitectures can be viewed as algorithms with a state, whichis passed on from layer to layer. Each layer reads the statefrom its preceding layer and writes to the subsequent changes the state but also passes on information that needsto be preserved.

ResNets [11] make this information preser-vation explicit through additive identity variations of ResNets [13] show that many layerscontribute very little and can in fact be randomly droppedduring training. This makes the state of ResNets similarto (unrolled) recurrent neural Networks [21], but the num-ber of parameters of ResNets is substantially larger becauseeach layer has its own weights. Our proposed DenseNet ar-chitecture explicitly differentiates between information thatis added to the network and information that is layers are very narrow ( , 12 filters per layer),adding only a small set of feature-maps to the collectiveknowledge of the network and keep the remaining feature-maps unchanged and the final classifier makes a decisionbased on all feature-maps in the better parameter efficiency, one big advantage ofDenseNets is their improved flow of information and gra-dients throughout the network, which makes them easy totrain. Each layer has direct access to the gradients from theloss function and the original input signal, leading to an im-plicit deep supervision [20].

This helps training of deepernetwork architectures. Further, we also observe that denseconnections have a regularizing effect, which reduces over-fitting on tasks with smaller training set evaluate DenseNets on four highly competitivebenchmark datasets (CIFAR-10, CIFAR-100, SVHN, andImageNet). Our models tend to require much fewer param-eters than existing algorithms with comparable , we significantly outperform the current state-of-the-art results on most of the benchmark Related WorkThe exploration of network architectures has been a partof neural network research since their initial discovery. Therecent resurgence in popularity of neural Networks has alsorevived this research domain. The increasing number of lay-ers in modern Networks amplifies the differences betweenarchitectures and motivates the exploration of different con-nectivity patterns and the revisiting of old research cascade structure similar to our proposed dense net-work layout has already been studied in the neural networksliterature in the 1980s [3].

Their pioneering work focuses onfully Connected multi-layer perceptrons trained in a layer-by-layer fashion. More recently, fully Connected cascadenetworks to be trained with batch gradient descent wereproposed [40]. Although effective on small datasets, thisapproach only scales to Networks with a few hundred pa-rameters. In [9, 23, 31, 41], utilizing multi-level featuresin CNNs through skip-connnections has been found to beeffective for various vision tasks. Parallel to our work, [1]derived a purely theoretical framework for Networks withcross-layer connections similar to Networks [34] were amongst the first architec-tures that provided a means to effectively train end-to-endnetworks with more than 100 layers. Using bypassing pathsalong with gating units, Highway Networks with hundredsof layers can be optimized without difficulty. The bypass-ing paths are presumed to be the key factor that eases thetraining of these very deep Networks . This point is furthersupported by ResNets [11], in which pure identity mappingsare used as bypassing paths.

ResNets have achieved im-pressive, record-breaking performance on many challeng-ing image recognition, localization, and detection tasks,such as ImageNet and COCO object detection [11]. Re-cently,stochastic depthwas proposed as a way to success-fully train a 1202-layer ResNet [13]. Stochastic depth im-proves the training of deep residual Networks by droppinglayers randomly during training. This shows that not alllayers may be needed and highlights that there is a greatamount of redundancy in deep (residual) Networks . Our pa-per was partly inspired by that observation. ResNets withpre-activationalso facilitate the training of state-of-the-artnetworks with>1000 layers [12].An orthogonal approach to making Networks deeper( , with the help of skip connections) is to increase thenetworkwidth. The GoogLeNet [36, 37] uses an Incep-tion module which concatenates feature-maps producedby filters of different sizes. In [38], a variant of ResNetswith wide generalized residual blocks was proposed.

Infact, simply increasing the number of filters in each layer ofConvolutionPoolingDense Block 1 ConvolutionPoolingPoolingLinearConvoluti onInputPrediction horse Dense Block 2 Dense Block 3 Figure 2:A deep DenseNet with three dense blocks. The layers between two adjacent blocks are referred to as transition layers and changefeature-map sizes via convolution and can improve its performance provided the depth issufficient [42]. FractalNets also achieve competitive resultson several datasets using a wide network structure [17].Instead of drawing representational power from ex-tremely deep or wide architectures, DenseNets exploit thepotential of the network throughfeature reuse, yielding con-densed models that are easy to train and highly parameter-efficient. Concatenating feature-maps learned bydifferentlayersincreases variation in the input of subsequent layersand improves efficiency. This constitutes a major differencebetween DenseNets and ResNets. Compared to Inceptionnetworks [36, 37], which also concatenate features from dif-ferent layers, DenseNets are simpler and more are other notable network architecture innovationswhich have yielded competitive results.

The Network inNetwork (NIN) [22] structure includes micro multi-layerperceptrons into the filters of Convolutional layers to ex-tract more complicated features. In Deeply Supervised Net-work (DSN) [20], internal layers are directly supervisedby auxiliary classifiers, which can strengthen the gradientsreceived by earlier layers. Ladder Networks [27, 25] in-troduce lateral connections into autoencoders, producingimpressive accuracies on semi-supervised learning [39], Deeply-Fused Nets (DFNs) were proposed to im-prove information flow by combining intermediate layersof different base Networks . The augmentation of networkswith pathways that minimize reconstruction losses was alsoshown to improve image classification models [43].3. DenseNetsConsider a single imagex0that is passed through a con-volutional network. The network comprisesLlayers, eachof which implements a non-linear transformationH`( ),where`indexes the `( )can be a composite func-tion of operations such as Batch Normalization (BN) [14],rectified linear units (ReLU) [6], Pooling [19], or Convolu-tion (Conv).

Densely Connected Convolutional Networks

Tags:

Information

Advertisement

Transcription of Densely Connected Convolutional Networks

Related search queries

Densely Connected Convolutional Networks

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries