ACTIVE LEARNING: THEORY AND APPLICATIONS

ACTIVE learning : THEORY AND APPLICATIONS . A DISSERTATION. SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE. AND THE COMMITTEE ON GRADUATE STUDIES. OF STANFORD UNIVERSITY. IN PARTIAL FULFILLMENT OF THE REQUIREMENTS. FOR THE DEGREE OF. DOCTOR OF PHILOSOPHY. Simon Tong August 2001. c Copyright by Simon Tong 2001. All Rights Reserved ii I certify that I have read this dissertation and that in my opin- ion it is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Daphne Koller Computer Science Department Stanford University (Principal Advisor). I certify that I have read this dissertation and that in my opin- ion it is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy.

David Heckerman Microsoft Research I certify that I have read this dissertation and that in my opin- ion it is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Christopher Manning Computer Science Department Stanford University Approved for the University Committee on Graduate Stud- ies: iii To my parents and sister. iv Abstract In many machine learning and statistical tasks, gathering data is time-consuming and costly;. thus, finding ways to minimize the number of data instances is beneficial. In many cases, ACTIVE learning can be employed. Here, we are permitted to actively choose future training data based upon the data that we have previously seen. When we are given this extra flex- ibility, we demonstrate that we can often reduce the need for large quantities of data.

We explore ACTIVE learning for three central areas of machine learning : classification, parameter estimation and causal discovery. Support vector machine classifiers have met with significant success in numerous real- world classification tasks. However, they are typically used with a randomly selected training set. We present theoretical motivation and an algorithm for performing ACTIVE learning with support vector machines. We apply our algorithm to text categorization and image retrieval and show that our method can significantly reduce the need for training data. In the field of artificial intelligence, Bayesian networks have become the framework of choice for modeling uncertainty. Their parameters are often learned from data, which can be expensive to collect.

The standard approach is to data that is randomly sampled from the underlying distribution. We show that the alternative approach of actively targeting data instances to collect is, in many cases, considerably better. Our final direction is the fundamental scientific task of causal structure discovery from empirical data. Experimental data is crucial for accomplishing this task. Such data is often expensive and must be chosen with great care. We use ACTIVE learning to determine the experiments to perform. We formalize the causal learning task as that of learning the structure of a causal Bayesian network and show that ACTIVE learning can substantially reduce the number of experiments required to determine the underlying causal structure of a domain.

V Acknowledgments My time at Stanford has been influenced and guided by a number of people to whom I am deeply indebted. Without their help, friendship and support, this thesis would likely never have seen the light of day. I would first like to thank the members of my thesis committee, Daphne Koller, David Heckerman and Chris Manning for their insights and guidance. I feel most fortunate to have had the opportunity to receive their support. My advisor, Daphne Koller, has had the greatest impact on my academic development during my time at graduate school. She had been a tremendous mentor, collaborator and friend, providing me with invaluable insights about research, teaching and academic skills in general. I feel exceedingly privileged to have had her guidance and I owe her a great many heartfelt thanks.

I would also like to thank the past and present members of Daphne's research group that I have had the great fortune of knowing: Eric Bauer, Xavier Boyen, Urszula Chajewska, Lise Getoor, Raya Fratkina, Nir Friedman, Carlos Guestrin, Uri Lerner, Brian Milch, Uri Nodelman, Dirk Ormoneit, Ron Parr, Avi Pfeffer, Andres Rodriguez, Merhan Sahami, Eran Segal, Ken Takusagawa and Ben Taskar. They have been great to knock around ideas with, to learn from, as well as being good friends. My appreciation also goes to Edward Chang. It was a privilege to have had the opportunity to work with Edward. He was instrumental in enabling the image retrieval system to be realized. I truly look forward to the chance of working with him again in the future. I also owe a great deal of thanks to friends in Europe who helped keep me sane and happy during the past four years: Shamim Akhtar, Jaime Brandwood, Kaya Busch, Sami Busch, Kris Cudmore, James Devenish, Andrew Dodd, Fabienne Kwan, Andrew Murray vi and too many others you know who you are!

My deepest gratitude and appreciation is reserved for my parents and sister. Without their constant love, support and encouragement and without their stories and down-to-earth banter to keep my feet firmly on the ground, I would never have been able to produce this thesis. I dedicate this thesis to them. vii Contents Abstract v Acknowledgments vi I Preliminaries 1. 1 Introduction 2. What is ACTIVE learning ? .. 2. ACTIVE Learners .. 4. Selective Setting .. 5. Interventional Setting .. 5. General Approach to ACTIVE learning .. 6. Thesis Overview .. 7. 2 Related Work 9. II Support Vector Machines 12. 3 Classification 13. Introduction .. 13. Classification Task .. 14. Induction .. 14. Transduction .. 15. viii ACTIVE learning for Classification .. 15.

Support Vector Machines .. 17. SVMs for Induction .. 17. SVMs for Transduction .. 19. Version Space .. 20. ACTIVE learning with SVMs .. 24. Introduction .. 24. Model and Loss .. 24. Querying Algorithms .. 27. Comment on Multiclass Classification .. 31. 4 SVM Experiments 36. Text Classification Experiments .. 36. Text Classification .. 36. Reuters Data Collection Experiments .. 37. Newsgroups Data Collection Experiments .. 43. Comparision with Other ACTIVE learning Systems .. 46. Image Retrieval Experiments .. 47. Introduction .. 47. The SVMA ctive Relevance Feedback Algorithm for Image Retrieval . 48. Image Characterization .. 49. Experiments .. 52. Multiclass SVM Experiments .. 59. III Bayesian Networks 64. 5 Bayesian Networks 65. Introduction.

65. Notation .. 66. Definition of Bayesian Networks .. 67. D-Separation and Markov Equivalence .. 68. ix Types of CPDs .. 70. Bayesian Networks as Models of Causality .. 70. Inference in Bayesian Networks .. 73. Variable Elimination Method .. 73. The Join Tree Algorithm .. 80. 6 Parameter Estimation 86. Introduction .. 86. Maximum Likelihood Parameter Estimation .. 87. Bayesian Parameter Estimation .. 89. Motivation .. 89. Approach .. 89. Bayesian One-Step Prediction .. 92. Bayesian Point Estimation .. 94. 7 ACTIVE learning for Parameter Estimation 97. Introduction .. 97. ACTIVE learning for Parameter Estimation .. 98. Updating Using an Actively Sampled Instance .. 99. Applying the General Framework for ACTIVE learning .. 100. ACTIVE learning Algorithm.

101. The Risk Function for KL-Divergence .. 102. Analysis for Single CPDs .. 103. Analysis for General BNs .. 105. Algorithm Summary and Properties .. 106. ACTIVE Parameter Experiments .. 108. 8 Structure learning 114. Introduction .. 114. Structure learning in Bayesian Networks .. 115. Bayesian approach to Structure learning .. 116. Updating using Observational Data .. 118. x Updating using Experimental Data .. 119. Computational Issues .. 121. 9 ACTIVE learning for Structure learning 122. Introduction .. 122. General Framework .. 123. Loss Function .. 125. Candidate Parents .. 126. Analysis for a Fixed Ordering .. 127. Analysis for Unrestricted Orderings .. 130. Algorithm Summary and Properties .. 133. Comment on Consistency .. 135. Structure Experiments.

ACTIVE LEARNING: THEORY AND APPLICATIONS

Tags:

Information

Transcription of ACTIVE LEARNING: THEORY AND APPLICATIONS

Related search queries

ACTIVE LEARNING: THEORY AND APPLICATIONS

Tags:

Information

Documents from same domain

ROS: an open-source Robot Operating System

Related documents

2 Graphical Models in a Nutshell - ai.stanford.edu

The Effectiveness of Online Learning: Beyond No ...

Medical Therapy for Inappropriate Sexual Behaviors in a ...

EORTC QLQ-C30 Reference Values

Statistische Tabelle Wolzenalp Schwinget Wolzenalp ob ...

Related search queries