Taskonomy: Disentangling Task Transfer Learning

Taskonomy: Disentangling Task Transfer LearningAmir R. Zamir1,2 Alexander Sax1 William Shen1 Leonidas Guibas1 Jitendra Malik2 Silvio Savarese11 Stanford University2 University of California, visual tasks have a relationship, or are they unre-lated? For instance, could having surface normals sim-plify estimating the depth of an image? Intuition answersthese questions positively, implying existence of astructureamong visual tasks. Knowing this structure has notable val-ues; it is the concept underlying Transfer Learning and pro-vides a principled way for identifying redundancies acrosstasks, in order to, for instance, seamlessly reuse supervi-sion among related tasks or solve many tasks in one systemwithout piling up the propose a fully computational approach for model-ing the structure of the space of visual tasks.

This is donevia finding (first and higher-order) Transfer Learning depen-dencies across a dictionary of twenty-six 2D, , 3D, andsemantic tasks in a latent space. The product is a computa-tional taxonomic map for task Transfer Learning . We studythe consequences of this structure, nontrivial emergedrelationships, and exploit them to reduce the demand for la-beled data. For example, we show that the total number oflabeled datapoints needed for solving a set of 10 tasks canbe reduced by roughly23(compared to training indepen-dently) while keeping the performance nearly the same. Weprovide a set of tools for computing and probing this taxo-nomical structure including a solver that users can employto devise efficient supervision policies for their use IntroductionObject recognition, depth estimation, edge detection,pose estimation, etc are examples of common vision tasksdeemed useful and tackled by the research of them have rather clear relationships: we under-stand that surface normals and depth are related (one is aderivate of the other), or vanishing points in a room are use-ful for orientation and layout estimation.

Other relation-ships are less clear: how edge detection and the shading ina room can, together, perform pose estimation. Class.(1000 class)Z-DepthDistance2D Keypoints2D PoseCam. Pose(fixated)Cam. Pose(non-fixated)Vanishing PtsRoomLayout 3D EdgesNormalsPointMatchingReshadingNormal s PointMatching 3 DEdges ReshadingFigure 1:A sample task structure discovered by the computationaltask taxonomy (taskonomy). It found that, for instance, by combining thelearned features of a surface normal estimator and occlusion edge detector,good networks for reshading and point matching can be rapidly trainedwith little labeled field of computer vision has indeed gone far withoutexplicitly using these relationships. We have made remark-able progress by developing advanced Learning machinery( ConvNets) capable of finding complex mappings fromXtoYwhen many pairs of(x,y) X,y Yaregiven as training data.

This is usually referred to as fully su-pervised Learning and often leads to problems being solvedin isolation. Siloing tasks makes training a new task or acomprehensive perception system a Sisyphean challenge,whereby each task needs to be learned individually fromscratch. Doing so ignores their quantifiably useful relation-ships leading to a massive labeled data , a model aware of the relationships amongtasks demands less supervision, uses less computation, andbehaves in more predictable sucha structure is the first stepping stone towards develop-1ing provably efficient comprehensive/universal perceptionmodels [34, 4], ones that can solve a large set of tasksbefore becoming intractable in supervision or computationdemands. However, this task space structure and its effectsare still largely unknown.

The relationships are non-trivial,and finding them is complicated by the fact that we haveimperfect Learning models and optimizers. In this paper,we attempt to shed light on this underlying structure andpresent a framework for mapping the space of visual what we mean by structure is a collection of com-putationally found relations specifying which tasks supplyuseful information to another, and by how much (see Fig. 1).We employ a fully computational approach for this pur-pose, with neural networks as the adopted computationalfunction class. In a feedforward network, each layer succes-sively forms more abstract representations of the input con-taining the information needed for mapping the input to theoutput. These representations, however, can transmit statis-tics useful for solving other outputs (tasks), presumably ifthe tasks are related in some form [83, 19, 58, 46].

This isthe basis of our approach: we compute an affinity matrixamong tasks based on whether the solution for one task canbe sufficiently easily read out of the representation trainedfor another task. Such transfers are exhaustively sampled,and a Binary Integer Programming formulation extracts aglobally efficient Transfer policy from them. We show thismodel leads to solving tasks with far less data than learn-ing them independently and the resulting structure holds oncommon datasets (ImageNet [78] and Places [104]).Being fully computational and representation-based, theproposed approach avoids imposing prior (possibly incor-rect) assumptions on the task space. This is crucial becausethe priors about task relations are often derived from eitherhuman intuition or analytical knowledge, while neural net-works need not operate on the same principles [63, 33, 40,45, 102, 88].

For instance, although we might expect depthto Transfer to surface normals better (derivatives are easy),the opposite is found to be the better direction in a compu-tational framework ( suited neural networks better).An interactive taxonomy solver which uses our modelto suggest data-efficient curricula, a live demo, dataset, andcode are available at Related WorkAssertions of existence of a structure among tasks dateback to the early years of modern computer science, Turing arguing for using Learning elements [95, 98]rather than the final outcome or Jean Piaget s works ondevelopmental stages using previously learned stages assources [74, 39, 38], and have extended to recent works [76,73, 50, 18, 97, 61, 11, 66]. Here we make an attempt to actu-ally find this structure. We acknowledge that this is relatedto a breadth of topics, compositional modeling [35, 10,13, 23, 55, 92, 90], homomorphic cryptography [42], life-long Learning [93, 15, 85, 84], functional maps [71], certainaspects of Bayesian inference and Dirichlet processes [54,91, 90, 89, 37, 39], few-shot Learning [81, 25, 24, 70, 86], Transfer Learning [75, 84, 29, 64, 67, 59], un/semi/self-supervised Learning [22, 8, 17, 103, 19, 83], which are stud-ied across various fields [73, 94, 12].

We review the topicsmost pertinent to vision within the constraints of space:Self-supervised learningmethods leverage the inherentrelationships between tasks to learn a desired expensive one( object detection) via a cheap surrogate ( coloriza-tion) [68, 72, 17, 103, 100, 69]. Specifically, they use amanually-entered local part of the structure in the task space(as the surrogate task is manually defined). In contrast, ourapproach models this large space of tasks in a computationalmanner and can discover obscure learningis concerned with the redundan-cies in the input domain and leveraging them for formingcompact representations, which are usually agnostic to thedownstream task [8, 49, 20, 9, 32, 77]. Our approach is notunsupervised by definition as it is not agnostic to the , it models the space tasks belong to and in a wayutilizes thefunctionalredundancies among seeks performing the learningat a level higher than where conventional Learning occurs, as employed in reinforcement Learning [21, 31, 28],optimization [2, 82, 48], or certain architectural mecha-nisms [27, 30, 87, 65].

The motivation behind meta learn-ing has similarities to ours and our outcome can be seen asa computational meta-structure of the space of learningtargets developing systems that canprovide multiple outputs for an input in one run [50, 18].Multi-task Learning has experienced recent progress and thereported advantages are another support for existence of auseful structure among tasks [93, 100, 50, 76, 73, 50, 18, 97,61, 11, 66]. Unlike multi-task Learning , we explicitly modelthe relations among tasks and extract a meta-structure. Thelarge number of tasks we consider also makes developingone multi-task network for all adaptionseeks to render a function that is de-veloped on a certain domain applicable to another [44, 99,5, 80, 52, 26, 36]. It often addresses a shift in theinputdo-main, webcam images to D-SLR [47], while the taskis kept the same.

In contrast, our framework is concernedwithoutput(task) space, hence can be viewed astask/outputadaptation. We also perform the adaptation in a larger spaceamong many elements, rather than two or a the context of our approach to modeling Transfer learn-ing across tasks: Learning Theoreticapproaches may overlap with anyof the above topics and usually focus on providing gener-alization guarantees. They vary in their approach: bymodeling transferability with the Transfer family required22nd3rdFrozen1stOrderOrderOrderT ask-specific2D SegmNormalsReshadingLayout2D SegmNormalsReshadingLayout(I) Task-specific Modeling(II) Transfer Modeling(III) Task Affinity Normalization(IV) Compute TaxonomyOutput spaceTask Space(representation)Input spaceObject Class. (100)Object Class. (1000)CurvatureScene )Semantic KeypointsDenoisingAutoencoding2D Edges2D Pose(fix)Occlusion EdgesReshadingCam.

Taskonomy: Disentangling Task Transfer Learning

Information

Advertisement

Transcription of Taskonomy: Disentangling Task Transfer Learning

Related search queries

Taskonomy: Disentangling Task Transfer Learning

Information

Advertisement

Related documents

AKVIS Coloriage 10 Now Even More Easy! NEW: …

Exercise 5 Smear and stain F10 - bulletworm.com

arXiv:1711.11585v1 [cs.CV] 30 Nov 2017

AvL TECHNOLOGIES

Simulation training Individual / Team training Types …

Free-Form RPG - tug.ca

Related search queries