PDF4PRO ⚡AMP

Modern search engine that looking for books and documents around the web

Example: tourism industry

Learning Spatio-Temporal Transformer for Visual Tracking

Learning Spatio-Temporal Transformer for Visual TrackingBin Yan1, , Houwen Peng2, , Jianlong Fu2, Dong Wang1, , Huchuan Lu11 Dalian University of Technology2 Microsoft Research AsiaAbstractIn this paper, we present a new Tracking architecturewith an encoder-decoder Transformer as the key compo-nent. The encoder models the global Spatio-Temporal fea-ture dependencies between target objects and search re-gions, while the decoder learns a query embedding to pre-dict the spatial positions of the target objects. Our methodcasts object Tracking as a direct bounding box predictionproblem, without using any proposals or predefined an-chors. With the encoder-decoder Transformer , the predic-tion of objects just uses a simple fully-convolutional net-work, which estimates the corners of objects directly. Thewhole method is end-to-end, does not need any postprocess-ing steps such as cosine window and bounding box smooth-ing, thus largely simplifying existing Tracking pipelines.

(30 v.s. 5 fps) on a Tesla V100 GPU, as shown in Fig.1 Considering recent trends of over-fitting on small-scale benchmarks, we collect a new large-scale tracking benchmark called NOTU, integrating all sequences from NFS [24], OTB100 [58], TC128 [33], and UAV123 [42]. In summary, this work has four contributions.

Tags:

  Tesla, V001, Tesla v100

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Spam in document Broken preview Other abuse

Transcription of Learning Spatio-Temporal Transformer for Visual Tracking

Related search queries