SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL …

SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL neural NETWORKSQing-Long Zhang, Yu-Bin Yang State Key Laboratory for Novel Software Technology at Nanjing UniversityABSTRACTA ttention mechanisms, which enable a neural network to ac-curately focus on all the relevant elements of the input, havebecome an essential component to improve the performance ofdeep neural networks. There are mainly two ATTENTION mecha-nisms widely used in computer vision studies,spatial attentionandchannel ATTENTION , which aim to capture the pixel-levelpairwise relationship and channel dependency, fusing them together may achieve better performancethan their individual implementations, it will inevitably in-crease the computational overhead.

In this paper, we proposean efficient SHUFFLE ATTENTION (SA) module to address this issue,which adopts SHUFFLE Units to combine two types of attentionmechanisms effectively. Specifically, SA first groups channeldimensions into multiple sub-features before processing themin parallel. Then, for each sub-feature, SA utilizes a ShuffleUnit to depict feature dependencies in both spatial and channeldimensions. After that, all sub-features are aggregated anda channel SHUFFLE operator is adopted to enable informationcommunication between different sub-features. The proposedSA module is efficient yet effective, , the parameters andcomputations of SA against the backbone ResNet50 are 300 and GFLOPs vs.

GFLOPs, respectively,and the performance boost is more than in terms ofTop-1 accuracy. Extensive experimental results on common-used benchmarks, including ImageNet-1k for classification,MS COCO for object detection, and instance segmentation,demonstrate that the proposed SA outperforms the currentSOTA methods significantly by achieving higher accuracywhile having lower model complexity. The code and modelsare available at Terms spatial ATTENTION , channel ATTENTION , chan-nel SHUFFLE , grouped features1. INTRODUCTIONA ttention mechanisms have been attracting increasing atten-tion in research communities since it helps to improve therepresentation of interests, , focusing on essential featureswhile suppressing unnecessary ones [1,2,3,4].

Recent stud-ies show that correctly incorporating ATTENTION mechanisms Funded by the Natural Science Foundation of China (No. 61673204).Fig. 1. Comparisons of recently SOTA ATTENTION models onImageNet-1k, including SENet, CBAM, ECA-Net, SGE-Net,and SA-Net, using ResNets as backbones, in terms of accuracy, network parameters, and GFLOPs. The size of circles indicatesthe GFLOPs. Clearly, the proposed SA-Net achieves higheraccuracy while having less model convolution blocks can significantly improve the perfor-mance of a broad range of computer vision tasks, , imageclassification, object detection, and instance are mainly two types of ATTENTION mechanisms mostcommonly used in computer vision:channel attentionandspa-tial ATTENTION , both of which strengthen the original featuresby aggregating the same feature from all the positions withdifferent aggregation strategies, transformations, and strength-ening functions [5,6,7,8,9].

Based on these observations,some studies, including GCNet [1] and CBAM [10] integratedbothspatial attentionandchannel attentioninto one moduleand achieving significant improvement [10,4]. However, theygenerally suffered from either converging difficulty or heavycomputation burdens. Other researches managed to simplifythe structure of channel or spatial ATTENTION [1,11]. For exam-ple, ECA-Net [11] simplifies the process of computing channelweights in SE block by using a 1-D convolution. SGE [12]groups the dimension of channels into multiple sub-features torepresent different semantics, and applies a spatial mechanismto each feature group by scaling the feature vectors over [ ] 30 Jan 2021locations with an ATTENTION mask.

However, they did not takefull advantage of the correlation between spatial and channelattention, making them less efficient. Can one fuse differentattention modules in a lighter but more efficient way? To answer this question, we first revisit the unit of Shuf-fleNet v2 [13], which can efficiently construct a multi-branchstructure and process different branches in parallel. Specif-ically, at the beginning of each unit, the input ofcfeaturechannels are split into two branches withc c andc chan-nels. Afterwards, several convolution layers are adopted tocapture a higher-level representation of the input. After theseconvolutions, the two branches are concatenated to make thenumber of channels as same as the number of input.

At last,the channel SHUFFLE operator (defined in [14]) is adopted toenable information communication between the two addition, to increase calculation speed, SGE [12] introducesa grouping strategy, which divides the input feature map intogroups along the channel dimension. Then all sub-features canbe enhanced on these above observations, this paper proposes alighter but more efficient SHUFFLE ATTENTION (SA) module fordeep CONVOLUTIONAL neural Networks(CNNs), which groupsthe dimensions of channel into sub-features. For each sub-feature, SA adopts the SHUFFLE Unit to constructchannel atten-tionandspatial attentionsimultaneously. For each attentionmodule, this paper designs an ATTENTION mask over all the posi-tions to suppress the possible noises and highlight the correctsemantic feature regions as well.

Experimental results onImageNet-1k (which are shown in Figure 1) have shown thatthe proposed simple but effective module containing fewerparameters has achieved higher accuracy than the current state-of-the-art key contributions in this paper are summarized asfollows: 1) we introduce a lightweight yet effective attentionmodule, SA, for deep CNNs, which groups channel dimen-sions into multiple sub-features, and then utilizes a ShuffleUnit to integrate the complementary channel and spatial at-tention module for each sub-feature. 2) extensive experimen-tal results on ImageNet-1k and MS COCO demonstrate thatthe proposed SA has lower model complexity than the state-of-the-art ATTENTION approaches while achieving RELATED WORKM ulti-branch architectures ofCNNs have evolved for years and are becoming more accu-rate and faster.

The principle behind multi-branch architec-tures is split-transform-merge , which eases the difficulty oftraining networks with hundreds of layers. The InceptionNetseries [15,16] are successful multi-branch architectures ofwhich each branch is carefully configured with customizedkernel filters, in order to aggregate more informative and mul-tifarious features. ResNets [17] can also be viewed as two-branch networks, in which one branch is the identity [2] and ShuffleNet families [13] both followed theidea of InceptionNets with various filters for multiple brancheswhile differing in at least two important aspects. SKNetsutilized an adaptive selection mechanism to realize adaptivereceptive field size of neurons.

ShuffleNets further merged channel split and channel SHUFFLE operators into a singleelement-wise operation to make a trade-off between speed features into groups datesback to AlexNet [18], whose motivation is distributing themodel over more GPU resources. Deep Roots examinedAlexNet and pointed out that convolution groups can learn bet-ter feature representations. The MobileNets [19,20] and Shuf-fleNets [13] treated each channel as a group, and modeled thespatial relationships within these groups. CapsuleNets [21,22]modeled each grouped neuron as a capsule, in which the neu-ron activity in the active capsule represented various attributesof a particular entity in the image.

SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL …

Tags:

Information

Transcription of SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL …

Related search queries

SA-NET: SHUFFLE ATTENTION FOR DEEP CONVOLUTIONAL …

Tags:

Information

Documents from same domain

Related documents

Related search queries