Generative Pretraining From Pixels
Found 4 free book(s)Taming Transformers for High-Resolution Image Synthesis
openaccess.thecvf.comsuitability of generative pretraining to learn image repre-sentations for downstream tasks. Since input resolutions of 32×32pixels are still quite computationally expensive [8], a VQVAE is used to encode images up to a resolution of 192× 192. In an effort to keep the learned discrete repre-sentation as spatially invariant as possible with ...
Digging Into Self-Supervised Monocular Depth Estimation
arxiv.orgauto-masking loss to ignore training pixels that violate cam-era motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark. 1. Introduction We seek to automatically infer a dense depth image from a single color input image. Estimating absolute, or even
A Simple Framework for Contrastive Learning of Visual ...
arxiv.orgpretraining (learning encoder network f without labels) is done using the ImageNet ILSVRC-2012 dataset (Rus-sakovsky et al.,2015). Some additional pretraining experi-ments on CIFAR-10 (Krizhevsky & Hinton,2009) can be found in AppendixB.9. We also test the pretrained results on a wide range of datasets for transfer learning. To evalu-
Three Ways To Improve Semantic Segmentation With Self ...
openaccess.thecvf.comto replace ImageNet pretraining for semantic segmentation. In contrast, we additionally study multi-task learning of SDE and semantic segmentation and show that combining SDE with ImageNet features can even further boost perfor-mance. Novosel et al. [42] and Klingner et al. [29] improve the semantic segmentation performance by jointly learning SDE.