PAConv: Position Adaptive Convolution With Dynamic Kernel ...

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling onPoint CloudsMutian Xu1*Runyu Ding1*Hengshuang Zhao2 Xiaojuan Qi1 1 The University of Hong Kong2 University of introducePositionAdaptive Convolution (PAConv),a generic Convolution operation for 3D point cloud process-ing. The key of PAConv is to construct the Convolution ker-nel by dynamically assembling basic weight matrices storedin Weight Bank, where the coefficients of these weight matri-ces are self-adaptively learned from point positions throughScoreNet. In this way, the Kernel is built in a data-drivenmanner, endowing PAConv with more flexibility than 2 Dconvolutions to better handle the irregular and unorderedpoint cloud data. Besides, the complexity of the learningprocess is reduced by combining weight matrices instead ofbrutally predicting kernels from point , different from the existing point convolu-tion operators whose network architectures are often heav-ily engineered, we integrate our PAConv into classicalMLP-based point cloud pipelineswithoutchanging net-work configurations.

Even built on simple networks, ourmethod still approaches or even surpasses the state-of-the-art models, and significantly improves baseline perfor-mance on both classification and segmentation tasks, yetwith decent efficiency. Thorough ablation studies and vi-sualizations are provided to understand PAConv. Code isreleased IntroductionIn recent years, the rise of 3D scanning technologieshas been promoting numerous applications that rely on 3 Dpoint cloud data, , autonomous driving, robotic manip-ulation and virtual reality [35,40]. Thus, the approachesto effectively and efficiently processing 3D point cloudsare in critical needs. While remarkable advancements havebeen obtained in 3D point cloud processing with deep learn-ing [36,37,47,25], it is yet a challenging task in view of thesparse, irregular and unordered structure of point clouds.*M. Xu and R. Ding contribute equally. Corresponding author ./0:1 ./0 ,-: ,-(a) PointNet(b) PointConv(c) KPConv(d) PAConvkernelscoreskernelscoresO ,-: (.)

/0)SOP ,-: 3 ./0:1 ./0 ,-: ,- ./0:1 ./0 ,-: 3 ,-: ,- ,-: 3 ./0:1 ./0 ,-: ,- positionweightsSOPK ernelsKernels KernelsKernelPoints SOPW eightBank SOPMLP # $.. - KL ,.. KN , KM ,MLP1 MLP2 ./0` #` $..` - # $.. - # $.. - KL ,.. KN , KM ,ScoreNet # $.. - KL ,.. KN , KM , ./0 ./0 ./0 Figure 1. Overview about convolutional sturctures of PointNet[36], PointConv [52], KPConv [47] and our PAConv. It illustratesthe differences of these point-based convolutions. SOP denotessymmetric operations, like tackle these difficulties, previous research can becoarsely cast into two categories. The first line attemptsto voxelize the 3D point clouds to form regular grids suchthat 3D grid convolutions can be adopted [33,43,39]. How-ever, important geometric information might be lost due toquantization, and voxels typically bring extra memory andcomputational costs [10,7].Another stream is to directly process point cloud pioneering work [36] proposes to learn the spatialencodings of points by combing Multi-Layer Perceptron(MLP) [13] and global aggregation as illustrated in (a).

Follow-up works [37,38,48,20,51] exploit local ag-gregation schemes to improve the network. Nonetheless, allthe points are processed by the same MLP, which limits thecapabilities in representing spatial-variant MLP, most recent works design Convolution -like operations on point clouds to exploit spatial correla-tions. To handle the irregularity of 3D point clouds, someworks [58,50,29] propose to directly predict the kernelweights based on relative location information, which is3173further used to transform features just like 2D convolu-tions. One representative architecture [52] in this line ofresearch is shown in (b). Albeit conceptually effec-tive, the methods severely suffer from heavy computationand memory costs caused by spatial-variant Kernel predic-tion in practice. The efficient implementation also trade-offs its design flexibility, leading to inferior group of works relate Kernel weights with fixedkernel points [2,47,32] and use a correlation (or interpo-lation) function to adjust the weight of kernels when theyare applied to process point clouds.

(c) illustrates onerepresentative architecture [47]. However, the hand-craftedcombination of kernels may not be optimal and sufficient tomodel the complicated 3D location this paper, we presentPositionAdaptive Convolution ,namely PAConv, which is a plug-and-play convolutional op-eration for deep representation learning on 3D point (shown in (d)) constructs its convolutionalkernels by dynamically assembling basic weight matricesin Weight Bank. The assembling coefficients are self-adaptively learned from relative point positions by MLPs( ScoreNet). Our PAConv is flexible to model the com-plicated spatial variations and geometric structures of 3 Dpoint clouds while being efficient. Specifically, insteadof inferring kernels from point positions [52] in a brute-force way, PAConv bypasses the huge memory and com-putational burden via a Dynamic Kernel assembling strategywith ScoreNet. Besides, unlike Kernel point methods [47],our PAConv gains flexibility to model spatial variations in adata-driven manner and is much simpler without requiringsophisticated designs for Kernel conduct extensive experiments on three challengingbenchmarks on top of three generic network , we adopt the simple MLP-based point net-works PointNet [36], PointNet++ [37] and DGCNN [51]as the backbones, and replace their MLPs with PAConvwithout changing other network configurations.

With thesesimple backbones, our method still achieves the state-of-the-art performance on ModelNet40 [53] and considerablyimproves the baseline ShapeNet Part [61] S3 DIS [1] with decent model efficiency. It s alsoworth noting that recent point Convolution methods oftenuse complicated architectures and data augmentations tai-lored to their operators [47,25,30] for evaluation, makingit difficult to measure the progress made by the convolu-tional operator. Here, we adopt simple baselines and aimto minimize the influence of network architectures to betterassess the performance gain from the operator Related WorkMapping point clouds into regular 2D or 3D grids (vox-els).Since point cloud data has irregular structure in 3 Dspace, early works [44,21,6] project point clouds to multi-view images and then utilize conventional convolutions forfeature learning. Yet, this 3D-to-2D projection is not robustto occluded surfaces or density variations. Tatarchenkoetal. [45] propose to map local surface points onto a tangentplane and further uses 2D convolutional operators, and FP-Conv [25] flattens local patches onto regular 2D grids withsoft weights.

However, they heavily rely on the estimationof tangent planes, and the projection process will inevitablysacrifice the 3D geometry information. Another techniqueis to quantize the 3D space and map points into regular vox-els [39,33,3,34], where 3D convolutions can be , the quantization will inevitably lose fine-grainedgeometric details, and the voxel representation is limited bythe heavy computation and memory cost. Recently, to ad-dress the above issues, sparse representations [43,10,7] areemployed to obtain smaller grids with better , they still suffer from the trade-off between thequantization rate and the computational representation learning with meth-ods [36,37,18,28,14] process unstructured point cloudsdirectly with point-wise MLPs. PointNet [36] is the pio-neering work which encodes each point individually withshared MLPs and aggregates all point features with globalpooling. However, it lacks the ability to capture local 3 Dstructures. Several follow-up works address this issue byadopting hierarchical multi-scale or weighted feature ag-gregation schemes to incorporate local features [37,19,23,16,18,28,55,17,14,60,54,56].

Other approaches usegraphs to represent point clouds [38,42,51,49,57], andthe point features are aggregated through local graph opera-tions, aiming to capture local point relationships. Nonethe-less, they all adopt the shared MLPs to transform pointfeatures, which limits the model capabilities in capturingspatial-variant representation learning with point recently, lots of attempts [24,58,50,52,29,47,32,30] focus on designing point convolutional [24] learns anX-transformation to relate pointswith kernels. However, this operation cannot satisfy permu-tation invariant, which is crucial for modeling un-orderedpoint cloud data. In addition, [41,11,58,50,52,29] pro-pose to directly learn the Kernel of local points based onpoint positions. Nevertheless, these methods directly pre-dict kernels, which has much higher complexity (memoryand computation) in the learning type of point convolutions associate weight ma-trices with pre-defined Kernel points in 3D space [2,5,47,32,26,22].

However, the positions of kernels have cru-cial influence on the final performance [47] and need to bespecifically optimized for different datasets or backbone ar-chitectures. Besides, the above approaches [47,32,22] gen-erate kernels through combining pre-defined kernels usinghand-crafted rules which limit the model flexibility, lead-3174ing to inferior performance [22]. Different from them, ourmethod adaptively combines weight matrices in a learn-ablemanner, which improves the capability of the operator to fitirregular point cloud and conditioned work is alsorelated to Dynamic and conditional convolutions [8,9,59].Brabandereet al. [8] propose to dynamically generateposition-specific filters on pixel inputs. In [9], throughlearning the offsets on Kernel coordinates, the original ker-nel space is deformed to adapt to different scales of , CondConv [59] generates the Convolution kernelby combining several filters through a routing function thatoutputs the coefficients for filter combination, which is sim-ilar with our Dynamic Kernel assembly.

Yet, the predictedkernels in CondConv [59] are not Position - Adaptive , whilethe unstructured point clouds require the weights that adaptto different point MethodIn this section, we first revisit the general formulation ofpoint convolutions. Then we introduce PAConv with dy-namic Kernel assembly. Finally, we compare PAConv withprior relevant works to demonstrate our OverviewGivenNpoints in a point cloudP={pi|i=1, .., N} RN 3, the input and output feature map ofPin a convolutional layer can be denoted asF={fi|i=1, .., N} RN CinandG={gi|i= 1, .., N} RN Coutrespectively, whereCinandCoutare the chan-nel numbers of the input and output. For each pointpi, thegeneralized point Convolution can be formulated as:gi= ({K(pi, pj)fj|pj Ni}),(1)whereK(pi, pj)is a function which outputs convolutionalweights according to the Position relation between the cen-ter pointpiand its neighboring allthe neighborhood points, and refers to the aggregationfunction in terms of MAX, SUM or AVG. Under this defi-nition, 2D Convolution can be regarded as a special case ofthe point Convolution .

PAConv: Position Adaptive Convolution With Dynamic Kernel ...

Tags:

Information

Advertisement

Transcription of PAConv: Position Adaptive Convolution With Dynamic Kernel ...

Related search queries

PAConv: Position Adaptive Convolution With Dynamic Kernel ...

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries