-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10_inference_in_bayesian_networks_1 #13
Open
Parsa2820
wants to merge
17
commits into
sut-ai:master
Choose a base branch
from
Parsa2820:lecturenote/10_bayesNetInference
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 12 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
6c8491b
Initial commit
Parsa2820 9cf6d2a
Add intro
Parsa2820 a5e25a8
Add enumeration details
Parsa2820 736bd0b
Add VE
Parsa2820 87c1c48
Add ordering for VE
Parsa2820 0dde737
Update TOC
Parsa2820 05bf0c2
Add conditioning example
Parsa2820 db15153
Change images size
Parsa2820 a7a4863
Add comparison image
Parsa2820 4289178
Fix some language mistakes
Parsa2820 eff9764
Add ordering image
Parsa2820 ea46ec2
Add class slide to references
Parsa2820 6e9ddcb
Fix images size
Parsa2820 9f7b2c4
Lecturenote/10 bayes net inference (#2)
Parsa2820 3042536
Replace notebook entry with md
Parsa2820 b3337b8
Merge branch 'master' into lecturenote/10_bayesNetInference
Parsa2820 c823435
Fix title heading
Parsa2820 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
*.ini |
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+196 KB
notebooks/10_inference_in_bayesian_networks_1/assets/conditioning-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,201 @@ | ||
# Inference in Bayes Nets 1 | ||
|
||
## Table of Contents | ||
|
||
- [Introduction](#introduction) | ||
- [Inference by Enumeration](#inference-by-enumeration) | ||
- [Algorithm Explanation](#algorithm-explanation) | ||
- [Algorithm Steps](#algorithm-steps) | ||
- [Algorithm Pseudocode](#algorithm-pseudocode) | ||
- [Algorithm Time Complexity](#algorithm-time-complexity) | ||
- [Algorithm Example](#algorithm-example) | ||
- [Inference by Variable Elimination (Marginalizing Early)](#inference-by-variable-elimination-marginalizing-early) | ||
- [Algorithm Explanation](#algorithm-explanation-1) | ||
- [Algorithm Steps](#algorithm-steps-1) | ||
- [Algorithm Pseudocode](#algorithm-pseudocode-1) | ||
- [Algorithm Time Complexity](#algorithm-time-complexity-1) | ||
- [Ordering Polytree Variables for VE](#ordering-polytree-variables-for-ve) | ||
- [Cut-set Conditioning](#cut-set-conditioning) | ||
- [Algorithm Example](#algorithm-example-1) | ||
- [Conclusions](#conclusions) | ||
- [References](#references) | ||
|
||
<div style="margin: auto; width: 50%;"><img src="./assets/intro.png" /></div> | ||
|
||
## Introduction | ||
|
||
The basic task of a Bayesian network is to compute the posterior probability distributions for a set of query variables, given an observation of a set of evidence variables. This process is known as inference, but is also called Bayesian updating, belief updating or reasoning. There are two ways to approach this, either exact or approximate. Both approaches are worst-case NP-hard. An exact method obviously gives an exact result, while an approximate method tries to approach the correct outcome as close as possible. In this lecture, we discuss the exact inference method. Approximate (Sampling) method will be discussed in the next lecture. | ||
|
||
## Inference by Enumeration | ||
|
||
The enumeration algorithm is a simple, brute-force algorithm for computing the distribution of a variable in a Bayes net. In this algorithm, we partition all Bayes net variables into three groups: | ||
|
||
1. evidence variables | ||
2. hidden variables | ||
3. query variables | ||
|
||
This algorithm takes query variables and evidence variables as input, and outputs the distribution of query variables. The evidence $e$ is whatever values you already know about the variables in the Bayes net. Evidence simplifies your work because instead of having to consider those variables’ whole distributions, you can assign them particular values, so they are no longer variables, they are constants. In the most general case, there is no evidence. | ||
|
||
### Algorithm Explanation | ||
|
||
This algorithm has to compute a distribution over $X$, which, because $X$ is a discrete variable, means computing the probability that $X$ takes on each of its possible values (the values in its domain). The algorithm does this simply by looping through all the possible values, and computing the probability for each one. Note that if there is no evidence, then it is literally just computing the probabilities $P(X=x_i)$ for each $x_i$ in $X$’s domain. If there is evidence, then it is computing $P(e, X=x_i)$ for each $x_i$ in $X$’s domain – that is, it is computing the probability that $X$ has the given value $x_i$ and the evidence is true – so in that case, we use the law of conditional probability, which says that $$P(X=x_i | e) = \frac{P(e, X=x_i)}{P(e)}$$. Once we have computed $P(e, X=x_i)$ for all $x_i$, we can just normalize those values to get the correct distribution $P(X | e)$. | ||
|
||
### Algorithm Steps | ||
|
||
We can summarize the explained algorithm in the following steps: | ||
|
||
1. Select the entries consistent with the evidence. | ||
2. Sum out the hidden variables to get joint distribution of query and evidence variables. | ||
3. Normalize the distribution to get the distribution of query variables. | ||
|
||
### Algorithm Pseudocode | ||
|
||
```python | ||
def enumeration_ask(X, e, bn): | ||
""" | ||
Input: | ||
X: query variable | ||
e: observed values for all variables | ||
bn: given Bayes net | ||
Output: | ||
P(X | e) | ||
""" | ||
q = ProbDist(X) # a ditribution over X, where q(x) = P(X=x) | ||
for xi in X.domain: | ||
q[xi] = enumerate_all(e + [(X, xi)], bn.vars) | ||
return q.normalize() | ||
|
||
def enumerate_all(e, vars): | ||
""" | ||
Input: | ||
e: observed values for all variables plus a new variable X assignment | ||
vars: list of all variables | ||
Output: | ||
P(e[-1] | e[:-1]) | ||
""" | ||
if not vars.any(): | ||
return 1.0 | ||
Y = vars[0] | ||
if e.contains(Y): | ||
return probability_condition_parents(Y, e, vars[1:]) * enumerate_all(e, vars[1:]) | ||
else: | ||
sum = 0.0 | ||
for yi in Y.domain: | ||
sum += probability_condition_parents(Y, e + [(Y, yi)], vars[1:]) * enumerate_all(e + [(Y, yi)], vars[1:]) | ||
return sum | ||
``` | ||
|
||
### Algorithm Time Complexity | ||
|
||
In the worst case, we have no evidence, so we have to for loop through all possible values of all variables. Hence, this algorithm has a complexity of $O(d^n)$ where $d$ is the size of the domain of the variables and $n$ is the number of variables. | ||
|
||
### Algorithm Example | ||
|
||
[Here](https://youtu.be/BrK7X_XlGB8) is a video of the algorithm running on a simple example. | ||
|
||
## Inference by Enumeration is Slow | ||
|
||
In enumeration, first we find the whole joint distribution, and then we marginalize out the hidden variables. Hence, it is too slow because of the big joint distribution we need to compute. | ||
|
||
If we marginalize out the hidden variables in the partial joint distribution, we can get a much faster algorithm. This method is called Inference by Variable Elimination. | ||
|
||
<div style="margin: auto; width: 100%;"><img src="./assets/comparison.png" /></div> | ||
|
||
## Inference by Variable Elimination (Marginalizing Early) | ||
|
||
The point of the variable-elimination algorithm is that it is more bottom-up than top-down. Instead of figuring out the probabilities we need to compute and then computing all the other probabilities that each one depends on, we try to compute probabilities and then compute the other terms that depend on them, and repeatedly simplify the expression until we have something that is in terms of only the variable we’re looking for. | ||
|
||
The variable-elimination algorithm uses things called factors. A factor is basically a CPT, except that the entries are not necessarily probabilities (but they would be if you normalized them). You can think of a factor as a matrix with a dimension for each variable, where $Factor[VAL1][VAL2][…]$ is (proportional to) a probability such as | ||
$$P(VAR1=VAL1, VAR2=VAL2, …)$$ | ||
or you can think of it as a table with one row for each possible combination of assignments of values to the variables. | ||
|
||
We also define two operations on factors: | ||
1. Join | ||
2. Eliminate | ||
|
||
Join is used to combine two factors. For example, if we have two factors $F_1$ and $F_2$, we can compute the joint distribution of $F_1$ and $F_2$ by multiplying their probabilities together, corresponding to the variable's value in factors. | ||
|
||
Eliminate is used to eliminate a variable from a factor. For example, if we have a factor $F$, and we want to eliminate $X$ from $F$, we can group rows by non-$X$ values and sum probabilities. This is exactly marginalization. | ||
|
||
> - Join is exactly like SQL join. | ||
> - Eliminate is like a group by and SUM aggregation function in SQL. | ||
|
||
### Algorithm Explanation | ||
|
||
This algorithm, like the previous one, takes a variable X and returns a distribution over X, given some evidence e. First, it initializes the list of factors; prior to any simplification, this is just the conditional probability tables for each variable given the evidence e. Then, it joins each factors with it. The summing-out process takes all the factors that depend on a given variable and replaces them with a single new factor that does not depend on that variable (by summing over all possible values of the variable). By the end of the loop, all the variables have been summed out except the query variable X, so then we can just multiply the factors together and normalize to get the distribution. | ||
|
||
### Algorithm Steps | ||
|
||
1. Initialize the list of factors which are local CPTs instantiated by the evidence. | ||
2. While there are any hidden variables: | ||
- Join all the factors containing the hidden variable. | ||
- Eliminate the hidden variable. | ||
3. Join all remaining factors. | ||
4. Normalize the resulting factor. | ||
|
||
### Algorithm Pseudocode | ||
|
||
```python | ||
def elimination_ask(X, e, bn): | ||
""" | ||
Input: | ||
X: query variable | ||
e: observed values for all variables | ||
bn: given Bayes net | ||
Output: | ||
P(X | e) | ||
""" | ||
factors = [t.marginalize(X) for t in bn.cpt] | ||
for var in bn.vars: | ||
if var not in e and var != X: | ||
relevant_factors = [f for f in factors if var in f.vars] | ||
for f in relevant_factors: | ||
factors.remove(f) | ||
factors.append(f.eliminate(var, join(relevant_factors))) | ||
return join(factors).normalize() | ||
``` | ||
|
||
### Algorithm Time Complexity | ||
|
||
The computational and space complexity of variable elimination is determined by the largest factor. In the worst case, this algorithm has exponential complexity, like the enumeration algorithm. But variable elimination ordering can greatly affect the largest factor. For example, in the following Bayes net, assuming the query is $P(X_n | Y_1, …, Y_n)$, the largest factor for following orders are different: | ||
|
||
- $Z, X_1, …, X_n \rightarrow 2^{n+1}$ | ||
- $X_1, …, X_n, Z \rightarrow 2^{2}$ | ||
|
||
<div style="margin: auto; width: 40%;"><img src="./assets/ve-ordering.png" /></div> | ||
|
||
There is no general ordering that provides only small factors. | ||
|
||
### Ordering Polytree Variables for VE | ||
|
||
Polytree is a directed graph which has no undirected cycles. In this special kind of graph we can introduce an algorithm for ordering nodes in order to achieve small factors, and efficiently compute the joint distribution of the variables. | ||
|
||
1. Drop edge directions | ||
2. Pick an arbitrary node as the root | ||
3. Do depth first search from the root | ||
4. Sort the resulting nodes in topological order | ||
5. Reverse the order | ||
|
||
Now, if we eliminate variables with this order, we would never get a factor larger than the original factors. This makes the VE algorithm linear time complexity. | ||
|
||
### Cut-set Conditioning | ||
|
||
We can cut the bayes net at an instantiated variable, and this can transform a multi connected graph into a polytree, for which we can find the order of elimination. If these variables are not actually known, we can set them to each of their possible values and then solve the problem with the polytree. You can see an example of this below. | ||
|
||
<div style="margin: auto; width: 70%;"><img src="./assets/conditioning-example.png" /></div> | ||
|
||
### Algorithm Example | ||
|
||
[Here](https://youtu.be/w4sJ8SazmFo) is a video of elimination of a variable from a set of factors. | ||
|
||
## Conclusions | ||
|
||
We reviewed two major exact inference algorithms, the enumeration algorithm and the variable elimination algorithm. The enumeration algorithm is a simple algorithm that is easy to understand and implement, but it is not very efficient. On the other hand, the variable elimination algorithm is a more complex algorithm that is more efficient, but it is also harder to understand and implement. For both introduced algorithms, the worst-case time complexity is exponential. So in practice, using sampling is usually a better choice, which you will learn more about in the next lecture. | ||
|
||
## References | ||
|
||
- [Class Presentation](http://ce.sharif.edu/courses/99-00/1/ce417-2/resources/root/Slides/PDF/Session%2013_14.pdf) | ||
- [Visualizing Inference in Bayesian Networks](http://www.kbs.twi.tudelft.nl/Publications/MSc/2006-JRKoiter-Msc.html) | ||
- [Exact Inference in Bayes Nets](http://courses.csail.mit.edu/6.034s/handouts/spring12/bayesnets-pseudocode.pdf) | ||
- [Variable Elimination](https://ermongroup.github.io/cs228-notes/inference/ve/) | ||
- [Bayesian Networks - Inference (Part II)](https://ccc.inaoep.mx/~esucar/Clases-mgp/Notes/c7-bninf-p2.pdf) |
29 changes: 29 additions & 0 deletions
29
notebooks/10_inference_in_bayesian_networks_1/matadata.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
title: Inference in Bayes Nets 1 | ||
|
||
header: | ||
title: Inference in Bayes Nets 1 # title of your notebook | ||
description: An Introduction to Inference in Bayesian Networks Part 1 (Up to Sampling) | ||
|
||
authors: | ||
label: | ||
position: top | ||
content: | ||
- name: Parsa Mohammadian | ||
role: Author | ||
contact: | ||
- link: https://github.com/parsa2820 | ||
icon: fab fa-github | ||
- name: Sara Azarnoush | ||
role: Author | ||
contact: | ||
- link: https://github.com/saaz742 | ||
icon: fab fa-github | ||
- name: Kasra Amani | ||
role: Author | ||
contact: | ||
- link: https://github.com/iTskAsra | ||
icon: fab fa-github | ||
|
||
comments: | ||
label: false | ||
kind: comments |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done with your Lecture Note! You've done a great job!
These few comments will help your LN get the best possible score:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. We appreciate your suggestions.
div
HTML tag withp
, which is now correctly shown.About the second suggestion, Github does not support math equations (there is an open issue for that), but Mr. Zehtab (@vahidzee ) have mentioned in Telegram that equations will be shown up in the Webifier after deployment. We have already tested the equations in our editor (VSCode), and they seem sound. By the way, if there is still a problem with them, we will be happy to solve them.