MnasNet: Platform-Aware Neural Architecture Search for …

mnasnet : Platform-Aware Neural Architecture Search for MobileMingxing Tan1Bo Chen2 Ruoming Pang1 Vijay Vasudevan1 Mark Sandler2 Andrew Howard2 Quoc V. Le11 Google Brain,2 Google Inc.{tanmingxing, bochen, rpang, vrv, sandler, howarda, convolutional Neural networks (CNN) formobile devices is challenging because mobile models needto be small and fast, yet still accurate. Although significantefforts have been dedicated to design and improve mobileCNNs on all dimensions, it is very difficult to manuallybalance these trade-offs when there are so many architec-tural possibilities to consider. In this paper, we proposean automated mobile Neural Architecture Search (MNAS)approach, which explicitly incorporate model latency intothe main objective so that the Search can identify a modelthat achieves a good trade-off between accuracy andlatency. Unlike previous work, where latency is consideredvia another, often inaccurate proxy ( , FLOPS), ourapproach directly measures real-world inference latencyby executing the model on mobile furtherstrike the right balance between flexibility and searchspace size, we propose a novel factorized hierarchicalsearch space that encourages layer diversity throughoutthe network.}

Experimental results show that our approachconsistently outperforms state-of-the-art mobile CNNmodels across multiple vision the ImageNetclassification task, our mnasnet achieves top-1accuracy with 78ms latency on a Pixel phone, which faster than MobileNetV2 [29] with higheraccuracy faster than NASNet [36] with accuracy. Our mnasnet also achieves better mAPquality than MobileNets for COCO object detection. Codeis IntroductionConvolutional Neural networks (CNN) have made signif-icant progress in image classification, object detection, andmany other applications. As modern CNN models becomeincreasingly deeper and larger [31, 13, 36, 26], they also be-come slower, and require more computation. Such increasesin computational demands make it difficult to deploy state-of-the-art CNN models on resource-constrained platformsSample models from Search spaceTrainerMobile phonesMulti-objective rewardlatencyrewardControlleraccuracyFig ure 1:An Overview of Platform-Aware Neural Archi-tecture Search for Latency (ms)7071727374757677 Imagenet Top 1 Accuracy (%)MobileNetV1 MobileNetV2 MobileNetV2( )NASNet-AAmoebaNet-AMnasNetFigure 2:Accuracy vs.

Latency Comparison Our Mnas-Net models significantly outperforms other mobile models[29, 36, 26] on ImageNet. Details can be found in Table as mobile or embedded restricted computational resources available onmobile devices, much recent research has focused on de-signing and improving mobile CNN models by reducingthe depth of the network and utilizing less expensive oper-ations, such as depthwise convolution [11] and group con-volution [33]. However, designing a resource-constrainedmobile model is challenging: one has to carefully balanceaccuracy and resource-efficiency, resulting in a significantlylarge design [ ] 29 May 2019In this paper, we propose an automated Neural architec-ture Search approach for designing mobile CNN 1 shows an overview of our approach, where themain differences from previous approaches are the latencyaware multi-objective reward and the novel Search approach is based on two main ideas.

First, we formu-late the design problem as a multi-objective optimizationproblem that considers both accuracy and inference latencyof CNN models. Unlike in previous work [36, 26, 21] thatuse FLOPS to approximate inference latency, we directlymeasure the real-world latency by executing the model onreal mobile devices. Our idea is inspired by the observa-tion that FLOPS is often an inaccurate proxy: for exam-ple, MobileNet [11] and NASNet [36] have similar FLOPS(575M vs. 564M), but their latencies are significantly dif-ferent (113ms vs. 183ms, details in Table 1). Secondly, weobserve that previous automated approaches mainly searchfor a few types of cells and then repeatedly stack the samecells through the network. This simplifies the Search pro-cess, but also precludes layer diversity that is important forcomputational efficiency. To address this issue, we proposea novelfactorized hierarchical Search space, which allowslayers to be architecturally different yet still strikes the rightbalance between flexibility and Search space apply our proposed approach to ImageNet classifica-tion [28] and COCO object detection [18].

Figure 2 sum-marizes a comparison between our mnasnet models andother state-of-the-art mobile models. Compared to the Mo-bileNetV2 [29], our model improves the ImageNet accuracyby with similar latency on the Google Pixel the other hand, if we constrain the target accuracy, thenour mnasnet models fasterthan fasterthans NASNet [36] with better to the widely used ResNet-50 [9], our MnasNetmodel achieves slightly higher ( ) accuracy fewerparameters and10 fewermultiply-add plugging our model as a feature extractor into the SSDobject detection framework, our model improves both theinference latency and the mAP quality on COCO datasetover MobileNetsV1 and MobileNetV2, and achieves com-parable mAP quality ( vs ) as SSD300 [22] with42 lessmultiply-add summarize, our main contributions are as follows:1. We introduce amulti-objectiveneural architecturesearch approach that optimizes both accuracy and real-world latency on mobile We propose a novelfactorized hierarchical searchspaceto enable layer diversity yet still strike the rightbalance between flexibility and Search space We demonstrate new state-of-the-art accuracy on bothImageNet classification and COCO object detectionunder typical mobile latency Related WorkImproving the resource efficiency of CNN models hasbeen an active research topic during the last several commonly-used approaches include 1) quantizing theweights and/or activations of a baseline CNN model intolower-bit representations [8, 16], or 2) pruning less impor-tant filters according to FLOPs [6, 10], or to platform-awaremetrics such as latency introduced in [32].

However, thesemethods are tied to a baseline model and do not focus onlearning novel compositions of CNN common approach is to directly hand-craft moreefficient mobile architectures: SqueezeNet [15] reduces thenumber of parameters and computation by using lower-cost 1x1 convolutions and reducing filter sizes; MobileNet[11] extensively employs depthwise separable convolutionto minimize computation density; ShuffleNets [33, 24] uti-lize low-cost group convolution and channel shuffle; Con-densenet [14] learns to connect group convolutions acrosslayers; Recently, MobileNetV2 [29] achieved state-of-the-art results among mobile-size models by using resource-efficient inverted residuals and linear bottlenecks. Unfortu-nately, given the potentially huge design space, these hand-crafted models usually take significant human , there has been growing interest in automatingthe model design process using Neural Architecture approaches are mainly based on reinforcement learn-ing [35, 36, 1, 19, 25], evolutionary Search [26], differen-tiable Search [21], or other learning algorithms [19, 17, 23].

Although these methods can generate mobile-size modelsby repeatedly stacking a few searched cells, they do not in-corporate mobile platform constraints into the Search pro-cess or Search space. Closely related to our work is MONAS[12], DPP-Net [3], RNAS [34] and Pareto-NASH [4] whichattempt to optimize multiple objectives, such as model sizeand accuracy, while searching for CNNs, but their searchprocess optimizes on small tasks like CIFAR. In contrast,this paper targets real-world mobile latency constraints andfocuses on larger tasks like ImageNet classification andCOCO object Problem FormulationWe formulate the design problem as a multi-objectivesearch, aiming at finding CNN models with both high-accuracy and low inference latency. Unlike previous ar-chitecture Search approaches that often optimize for indi-rect metrics, such as FLOPS, we consider directreal-worldinference latency, by running CNN models on real mobiledevices, and then incorporating the real-world inference la-tency into our objective.

Doing so directly measures whatis achievable in practice: our early experiments show it ischallenging to approximate real-world latency due to thevariety of mobile hardware/software = 0 = 1 Acc(m)= , T=8020406080100120140160 Model Latency (ms) = = (m)= , T=80 Figure 3:Objective Function Defined by Equation2, assuming accuracyACC(m)= and target latencyT=80ms: (top) show the object values with latency as ahard constraint; (bottom) shows the objective values withlatency as a soft a modelm, letACC(m)denote its accuracy onthe target task,LAT(m)denotes the inference latency onthe target mobile platform, andTis the target latency. Acommon method is to treatTas a hard constraint and max-imize accuracy under this constraint:maximizemACC(m)subject toLAT(m) T(1)However, this approach only maximizes a single metric anddoes not provide multiple Pareto optimal solutions. Infor-mally, a model is called Pareto optimal [2] if either it hasthe highest accuracy without increasing latency or it has thelowest latency without decreasing accuracy.

Given the com-putational cost of performing Architecture Search , we aremore interested in finding multiple Pareto-optimal solutionsin a single Architecture there are many methods in the literature [2], weuse a customized weighted product method1to approximatePareto optimal solutions, with optimization goal defined as:maximizemACC(m) [LAT(m)T]w(2)wherewis the weight factor defined as:w={ ,ifLAT(m) T ,otherwise(3)1We pick the weighted product method because it is easy to customize,but we expect methods like weighted sum should be also and are application-specific constants. An empir-ical rule for picking and is to ensure Pareto-optimal so-lutions have similar reward under different accuracy-latencytrade-offs. For instance, we empirically observed doublingthe latency usually brings about 5% relative accuracy two models: (1) M1 has latencyland accuracya; (2)M2 has latency2land 5% higher accuracya (1 + 5%),they should have similar reward:Reward(M2) =a (1 +5%) (2l/T) Reward(M1) =a (l/T).}

Solving thisgives Therefore, we use = = experiments unless explicitly 3 shows the objective function with two typicalvalues of( , ). In the top figure with ( = 0, = 1),we simply use accuracy as the objective value if measuredlatency is less than the target latencyT; otherwise, wesharply penalize the objective value to discourage mod-els from violating latency constraints. The bottom figure( = = ) treats the target latencyTas a soft con-straint, and smoothly adjusts the objective value based onthe measured Mobile Neural Architecture SearchIn this section, we will first discuss our proposed novelfactorized hierarchical Search space, and then summarizeour reinforcement-learning based Search Factorized Hierarchical Search SpaceAs shown in recent studies [36, 20], a well-definedsearch space is extremely important for Neural architecturesearch. However, most previous approaches [35, 19, 26]only Search for a few complex cells and then repeatedlystack the same cells.

MnasNet: Platform-Aware Neural Architecture Search for …

Tags:

Information

Transcription of MnasNet: Platform-Aware Neural Architecture Search for …

Related search queries

MnasNet: Platform-Aware Neural Architecture Search for …

Tags:

Information

Documents from same domain

Related documents

Related search queries