Example: confidence

Bayesian Networks - TAU

Ben-Gal I., Bayesian Networks , in Ruggeri F., Faltin F. & Kenett R., Encyclopedia of Statistics in Quality & Reliability, Wiley & Sons (2007). Bayesian Networks of nodes from which the node can be reached on a direct path [4]. The structure of the acyclic graph guarantees that there is no node that can be its own ancestor or its own descendent. Such a Introduction condition is of vital importance to the factorization Bayesian Networks (BNs), also known as belief net- of the joint probability of a collection of nodes as works (or Bayes nets for short), belong to the fam- seen below. Note that although the arrows represent ily of probabilistic graphical model s (GMs). These direct causal connection between the variables, the graphical structures are used to represent knowledge reasoning process can operate on BNs by propagating about an uncertain domain.

Bayesian Networks 3 investigate the structure of the JPD modeled by a BN is called d-separation [3, 9]. It captures both the con-ditional independence and dependence relations that

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Bayesian Networks - TAU

1 Ben-Gal I., Bayesian Networks , in Ruggeri F., Faltin F. & Kenett R., Encyclopedia of Statistics in Quality & Reliability, Wiley & Sons (2007). Bayesian Networks of nodes from which the node can be reached on a direct path [4]. The structure of the acyclic graph guarantees that there is no node that can be its own ancestor or its own descendent. Such a Introduction condition is of vital importance to the factorization Bayesian Networks (BNs), also known as belief net- of the joint probability of a collection of nodes as works (or Bayes nets for short), belong to the fam- seen below. Note that although the arrows represent ily of probabilistic graphical model s (GMs). These direct causal connection between the variables, the graphical structures are used to represent knowledge reasoning process can operate on BNs by propagating about an uncertain domain.

2 In particular, each node information in any direction [5]. in the graph represents a random variable, while A BN reflects a simple conditional independence the edges between the nodes represent probabilistic statement. Namely that each variable is independent dependencies among the corresponding random vari- of its nondescendents in the graph given the state ables. These conditional dependencies in the graph of its parents. This property is used to reduce, are often estimated by using known statistical and sometimes significantly, the number of parameters computational methods. Hence, BNs combine princi- that are required to characterize the JPD of the ples from graph theory, probability theory, computer variables.

3 This reduction provides an efficient way science, and statistics. to compute the posterior probabilities given the GMs with undirected edges are generally called evidence [3, 6, 7]. Markov random fields or Markov Networks . These In addition to the DAG structure, which is often Networks provide a simple definition of independence considered as the qualitative part of the model, one between any two distinct nodes based on the concept needs to specify the quantitative parameters of the of a Markov blanket. Markov Networks are popular in model. The parameters are described in a manner fields such as statistical physics and computer vision which is consistent with a Markovian property, where [1, 2].

4 The conditional probability distribution (CPD) at each BNs correspond to another GM structure known node depends only on its parents. For discrete random as a directed acyclic graph (DAG) that is popular in variables, this conditional probability is often repre- the statistics, the machine learning, and the artificial sented by a table, listing the local probability that a intelligence societies. BNs are both mathematically child node takes on each of the feasible values for rigorous and intuitively understandable. They enable each combination of values of its parents. The joint an effective representation and computation of the distribution of a collection of variables can be deter- joint probability distribution (JPD) over a set of mined uniquely by these local conditional probability random variables [3].

5 Tables (CPTs). The structure of a DAG is defined by two sets: the set of nodes (vertices) and the set of directed edges. Following the above discussion, a more formal The nodes represent random variables and are drawn definition of a BN can be given [7]. A Bayesian net- as circles labeled by the variable names. The edges work B is an annotated acyclic graph that represents represent direct dependence among the variables and a JPD over a set of random variables V. The net- are drawn by arrows between nodes. In particular, an work is defined by a pair B = G, , where G is the edge from node Xi to node Xj represents a statistical DAG whose nodes X1 , X2 , .. , Xn represents ran- dependence between the corresponding variables.

6 Dom variables, and whose edges represent the direct Thus, the arrow indicates that a value taken by dependencies between these variables. The graph G. variable Xj depends on the value taken by variable encodes independence assumptions, by which each Xi , or roughly speaking that variable Xi influences variable Xi is independent of its nondescendents Xj . Node Xi is then referred to as a parent of given its parents in G. The second component . Xj and, similarly, Xj is referred to as the child denotes the set of parameters of the network. This of Xi . An extension of these genealogical terms set contains the parameter xi | i = PB (xi | i ) for each is often used to define the sets of descendants realization xi of Xi conditioned on i , the set of par- the set of nodes that can be reached on a direct ents of Xi in G.

7 Accordingly, B defines a unique JPD. path from the node, or ancestor nodes the set over V, namely: Ben-Gal I., Bayesian Networks , in Ruggeri F., Faltin F. & Kenett R., Encyclopedia of Statistics in Quality & Reliability, Wiley & Sons (2007). 2 Bayesian Networks . n . n or false (denoted by F ). The CPT of each node is PB (X1 , X2 , .. , Xn ) = PB (Xi | i ) = Xi | i listed besides the node. i=1 i=1 In this example the parents of the variable (1) Back are the nodes Chair and Sport. The child of Back is Ache, and the parent of Worker is For simplicity of representation we omit the sub- Chair. Following the BN independence assumption, script B henceforth. If Xi has no parents, its local several independence statements can be observed probability distribution is said to be unconditional, in this case.

8 For example, the variables Chair otherwise it is conditional. If the variable represented and Sport are marginally independent, but when by a node is observed, then the node is said to be an Back is given they are conditionally dependent. evidence node, otherwise the node is said to be hidden This relation is often called explaining away. When or latent. Chair is given, Worker and Back are conditionally Consider the following example that illustrates independent. When Back is given, Ache is con- some of the characteristics of BNs. The example ditionally independent of its ancestors Chair and shown in Figure 1 has a similar structure to the clas- Sport. The conditional independence statement of the sical earthquake example in Pearl [3].

9 It considers a BN provides a compact factorization of the JPDs. person who might suffer from a back injury, an event Instead of factorizing the joint distribution of all represented by the variable Back (denoted by B). the variables by the chain rule, , P(C,S,W,B,A) =. Such an injury can cause a backache, an event rep- P(C)P(S|C)P(W|S,C)P(B|W,S,C)P(A|B,W,S,C) , the resented by the variable Ache (denoted by A). The BN defines a unique JPD in a factored form, back injury might result from a wrong sport activ- P(C,S,W,B,A) = P(C)P(S)P(W|C)P(B|S, C)P(A|B). ity, represented by the variable Sport (denoted by S) Note that the BN form reduces the number of the or from new uncomfortable chairs installed at the model parameters, which belong to a multinomial person's office, represented by the variable Chair distribution in this case, from 25 1 = 31 to 10.

10 (denoted by C). In the latter case, it is reasonable to parameters. Such a reduction provides great bene- assume that a coworker will suffer and report a sim- fits from inference, learning (parameter estimation), ilar backache syndrome, an event represented by the and computational perspective. The resulting model variable Worker (denoted by W). All variables are is more robust with respect to bias-variance effects binary; thus, they are either true (denoted by T ) [8]. A practical graphical criterion that helps to P(C = T) P(C = F) P(S = T) P(S = F). Chair Sport C S P(B = T|C,S) P(B = F|C,S). C P(W = T|C) P(W = F|C). T T T T F F Worker Back F T F F B P(A = T|B) P(A = F|B).


Related search queries