# Belief Networks “Acyclicity”. Thought of the Day 69.0

Belief networks are used to model uncertainty in a domain. The term “belief networks” encompasses a whole range of different but related techniques which deal with reasoning under uncertainty. Both quantitative (mainly using Bayesian probabilistic methods) and qualitative techniques are used. Influence diagrams are an extension to belief networks; they are used when working with decision making. Belief networks are used to develop knowledge based applications in domains which are characterised by inherent uncertainty. Increasingly, belief network techniques are being employed to deliver advanced knowledge based systems to solve real world problems. Belief networks are particularly useful for diagnostic applications and have been used in many deployed systems. The free-text help facility in the Microsoft Office product employs Bayesian belief network technology. Within a belief network the belief of each node (the node’s conditional probability) is calculated based on observed evidence. Various methods have been developed for evaluating node beliefs and for performing probabilistic inference. Influence diagrams, which are an extension of belief networks, provide facilities for structuring the goals of the diagnosis and for ascertaining the value (the influence) that given information will have when determining a diagnosis. In influence diagrams, there are three types of node: chance nodes, which correspond to the nodes in Bayesian belief networks; utility nodes, which represent the utilities of decisions; and decision nodes, which represent decisions which can be taken to influence the state of the world. Influence diagrams are useful in real world applications where there is often a cost, both in terms of time and money, in obtaining information.

The basic idea in belief networks is that the problem domain is modelled as a set of nodes interconnected with arcs to form a directed acyclic graph. Each node represents a random variable, or uncertain quantity, which can take two or more possible values. The arcs signify the existence of direct influences between the linked variables, and the strength of each influence is quantified by a forward conditional probability.

The Belief Network, which is also called the Bayesian Network, is a directed acyclic graph for probabilistic reasoning. It defines the conditional dependencies of the model by associating each node X with a conditional probability P(X|Pa(X)), where Pa(X) denotes the parents of X. Here are two of its conditional independence properties:

1. Each node is conditionally independent of its non-descendants given its parents.

2. Each node is conditionally independent of all other nodes given its Markov blanket, which consists of its parents, children, and children’s parents.

The inference of Belief Network is to compute the posterior probability distribution

P(H|V) = P(H,V)/ ∑HP(H,V)

where H is the set of the query variables, and V is the set of the evidence variables. Approximate inference involves sampling to compute posteriors. The Sigmoid Belief Network is a type of the Belief Network such that

P(Xi = 1|Pa(Xi)) = σ( ∑Xj ∈ Pa(Xi) WjiXj + bi)

where Wji is the weight assigned to the edge from Xj to Xi, and σ is the sigmoid function.

# Bayesian Networks and Machine Learning

A Bayesian network (BN) is a probabilistic directed acyclic graph representing a set of random variables and their dependence on one another. BNs play an important role in machine learning as they can be used to calculate the probability of a new piece of data being sorted into an existing class by comparison with training data.

Each variable requires a finite set of mutually exclusive (independent) states. A node with a dependent is called a parent node and each connected pair has a set of conditional probabilities defined by their mutual dependence. Each node depends only on its parents and has conditional independence from any node it is not descended from. Using this definition, and taking n to be the number of nodes in the set of training data, the joint probability of the set of all nodes, {X1, X2, · · · Xn}, is defined for any graph as

P(Xi) = ∏ni=1 P(Xii)

where πi refers to the set of parents of Xi. Any conditional probability between two nodes can then be calculated.

An argument for the use of BNs over other methods is that they are able to “smooth” data models, making all pieces of data usable for training. However, for a BN with m nodes, the number of possible graphs is exponential in n; a problem which has been addressed with varying levels of success. The bulk of the literature on learning with BNs utilises model selection. This is concerned with using a criterion to measure the fit of the network structure to the original data, before applying a heuristic search algorithm to find an equivalence class that does well under these conditions. This is repeated over the space of BN structures. A special case of BNs is the dynamic (time-dependent) hidden Markov model (HMM), in which only outputs are visible and states are hidden. Such models are often used for speech and handwriting recognition, as they can successfully evaluate which sequences of words are the most common.

Quantum Bayesian networks (QBNs) and hidden quantum Markov models (HQMMs) have been demonstrated theoretically, but there is currently no experimental research. The format of a HMM lends itself to a smooth transition into the language of open quantum systems. Clark et al. claim that open quantum systems with instantaneous feedback are examples of HQMMs, with the open quantum system providing the internal states and the surrounding bath acting as the ancilla, or external state. This allows feedback to guide the internal dynamics of the system, thus conforming to the description of an HQMM.