Densely Connected Convolutional Networks - arXiv

Densely Connected Convolutional NetworksGao Huang Cornell Liu Tsinghua van der MaatenFacebook AI Q. WeinbergerCornell work has shown that Convolutional Networks canbe substantially deeper, more accurate, and efficient to trainif they contain shorter connections between layers close tothe input and those close to the output. In this paper, weembrace this observation and introduce the Dense Convo-lutional network (DenseNet), which connects each layerto every other layer in a feed-forward fashion. Whereastraditional Convolutional Networks withLlayers haveLconnections one between each layer and its subsequentlayer our network hasL(L+1)2direct connections.

Foreach layer, the feature-maps of all preceding layers areused as inputs, and its own feature-maps are used as inputsinto all subsequent layers. DenseNets have several com-pelling advantages: they alleviate the vanishing-gradientproblem, strengthen feature propagation, encourage fea-ture reuse, and substantially reduce the number of parame-ters. We evaluate our proposed architecture on four highlycompetitive object recognition benchmark tasks (CIFAR-10,CIFAR-100, SVHN, and ImageNet). DenseNets obtain sig-nificant improvements over the state-of-the-art on most ofthem, whilst requiring less computation to achieve high per-formance.

Code and pre-trained models are available IntroductionConvolutional neural Networks (CNNs) have becomethe dominant machine learning approach for visual objectrecognition. Although they were originally introduced over20 years ago [18], improvements in computer hardware andnetwork structure have enabled the training of truly deepCNNs only recently. The original LeNet5 [19] consisted of5 layers, VGG featured 19 [29], and only last year Highway Authors contributed equallyx0x1H1x2H2H3H4x3x4 Figure 1:A 5-layer dense block with a growth rate ofk= layer takes all preceding feature-maps as [34] and Residual Networks (ResNets) [11] havesurpassed the 100-layer CNNs become increasingly deep , a new researchproblem emerges: as information about the input or gra-dient passes through many layers, it can vanish and washout by the time it reaches the end (or beginning) of thenetwork.

Many recent publications address this or relatedproblems. ResNets [11] and Highway Networks [34] by-pass signal from one layer to the next via identity connec-tions. Stochastic depth [13] shortens ResNets by randomlydropping layers during training to allow better informationand gradient flow. FractalNets [17] repeatedly combine sev-eral parallel layer sequences with different number of con-volutional blocks to obtain a large nominal depth, whilemaintaining many short paths in the network . Althoughthese different approaches vary in network topology andtraining procedure, they all share a key characteristic: theycreate short paths from early layers to later [ ] 28 Jan 2018In this paper, we propose an architecture that distills thisinsight into a simple connectivity pattern: to ensure maxi-mum information flow between layers in the network , weconnectall layers(with matching feature-map sizes) di-rectly with each other.

To preserve the feed-forward nature,each layer obtains additional inputs from all preceding lay-ers and passes on its own feature-maps to all subsequentlayers. Figure 1 illustrates this layout schematically. Cru-cially, in contrast to ResNets, we never combine featuresthrough summation before they are passed into a layer; in-stead, we combine features by concatenating them. Hence,the`thlayer has`inputs, consisting of the feature-mapsof all preceding Convolutional blocks. Its own feature-mapsare passed on to allL `subsequent layers. This introducesL(L+1)2connections in anL-layer network , instead of justL, as in traditional architectures.

Because of its dense con-nectivity pattern, we refer to our approach asDense Convo-lutional network (DenseNet).A possibly counter-intuitive effect of this dense connec-tivity pattern is that it requiresfewerparameters than tra-ditional Convolutional Networks , as there is no need to re-learn redundant feature-maps. Traditional feed-forward ar-chitectures can be viewed as algorithms with a state, whichis passed on from layer to layer. Each layer reads the statefrom its preceding layer and writes to the subsequent changes the state but also passes on information that needsto be preserved.

ResNets [11] make this information preser-vation explicit through additive identity variations of ResNets [13] show that many layerscontribute very little and can in fact be randomly droppedduring training. This makes the state of ResNets similarto (unrolled) recurrent neural Networks [21], but the num-ber of parameters of ResNets is substantially larger becauseeach layer has its own weights. Our proposed DenseNet ar-chitecture explicitly differentiates between information thatis added to the network and information that is layers are very narrow ( , 12 filters per layer),adding only a small set of feature-maps to the collectiveknowledge of the network and keep the remaining feature-maps unchanged and the final classifier makes a decisionbased on all feature-maps in the better parameter efficiency, one big advantage ofDenseNets is their improved flow of information and gra-dients throughout the network , which makes them easy totrain.

Each layer has direct access to the gradients from theloss function and the original input signal, leading to an im-plicit deep supervision [20]. This helps training of deepernetwork architectures. Further, we also observe that denseconnections have a regularizing effect, which reduces over-fitting on tasks with smaller training set evaluate DenseNets on four highly competitivebenchmark datasets (CIFAR-10, CIFAR-100, SVHN, andImageNet). Our models tend to require much fewer param-eters than existing algorithms with comparable , we significantly outperform the current state-of-the-art results on most of the benchmark Related WorkThe exploration of network architectures has been a partof neural network research since their initial discovery.

Therecent resurgence in popularity of neural Networks has alsorevived this research domain. The increasing number of lay-ers in modern Networks amplifies the differences betweenarchitectures and motivates the exploration of different con-nectivity patterns and the revisiting of old research cascade structure similar to our proposed dense net-work layout has already been studied in the neural networksliterature in the 1980s [3]. Their pioneering work focuses onfully Connected multi-layer perceptrons trained in a layer-by-layer fashion. More recently, fully Connected cascadenetworks to be trained with batch gradient descent wereproposed [40].

Although effective on small datasets, thisapproach only scales to Networks with a few hundred pa-rameters. In [9, 23, 31, 41], utilizing multi-level featuresin CNNs through skip-connnections has been found to beeffective for various vision tasks. Parallel to our work, [1]derived a purely theoretical framework for Networks withcross-layer connections similar to Networks [34] were amongst the first architec-tures that provided a means to effectively train end-to-endnetworks with more than 100 layers. Using bypassing pathsalong with gating units, Highway Networks with hundredsof layers can be optimized without difficulty.

Densely Connected Convolutional Networks - arXiv

Tags:

Information

Transcription of Densely Connected Convolutional Networks - arXiv

Related search queries

Densely Connected Convolutional Networks - arXiv

Tags:

Information

Documents from same domain

Related documents

Related search queries