Search results with tag "A video vision transformer"

ViViT: A Video Vision Transformer - arXiv

arxiv.org

we present several methods of factorising our model along spatial and temporal dimensions to increase efﬁciency and scalability. Furthermore, to train our model effectively on smaller datasets, we show how to reguliarise our model dur-ing training and leverage pretrained image models. We also note that convolutional models have been de-

Video, Vision, Transformers, Factorising, Vivit, A video vision transformer

Search results with tag "A video vision transformer"

ViViT: A Video Vision Transformer - arXiv

Similar queries