![Hands-On Mathematics for Deep Learning](https://wfqqreader-1252317822.image.myqcloud.com/cover/81/36698081/b_36698081.jpg)
Conditional probability
Conditional probabilities are useful when the occurrence of one event leads to the occurrence of another. If we have two events, A and B, where B has occurred and we want to find the probability of A occurring, we write this as follows:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_1032.jpg?sign=1739311650-HUso3b7FPsNwjb4nlIYz1rqDnCvZKsIM-0-5dbd1027a4c8e304e148a9a9bd99f683)
Here, .
However, if the two events, A and B, are independent, then we have the following:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_1104.jpg?sign=1739311650-scoTW6r6M6IIU2JgcItiDCItZwLIU8t0-0-fd40300b6ea35b738959bb99abf4080c)
Additionally, if , then it is said that B attracts A. However, if A attracts BC, then it repels B.
The following are some of the axioms of conditional probability:
.
.
.
is a probability function that works only for subsets of B.
.
- If
, then
.
The following equation is known as Bayes' rule:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_1751.jpg?sign=1739311650-Is1SnCmjXYmSvd5aeSgYr2mqBcncCJuo-0-0b3a2c29a4e01c19b9aecb37ab51d70f)
This can also be written as follows:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_68.jpg?sign=1739311650-QBkb6P5drJjTL1VkIT23gb9KUe6r6eee-0-5a2590a035b2c913c740fc0c7b146eac)
Here, we have the following:
is called the prior.
is the posterior.
is the likelihood.
acts as a normalizing constant.
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_273.jpg?sign=1739311650-aopslfDrEdwFTKWS6MKLwmthegVPfkhF-0-a2986365e71026f237eabaec174deb1f)
Often, we end up having to deal with complex events, and to effectively navigate them, we need to decompose them into simpler events.
This leads us to the concept of partitions. A partition is defined as a collection of events that together makes up the sample space, such that, for all cases of Bi, .
In the coin flipping example, the sample space is partitioned into two possible events—heads and tails.
If A is an event and Bi is a partition of Ω, then we have the following:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_225.jpg?sign=1739311650-kJCieCNkZAItQHfVr0q0xD0BrvI4J9zq-0-bea89e0d50abc3a864064b326243245f)
We can also rewrite Bayes' formula with partitions so that we have the following:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_58.jpg?sign=1739311650-MG8W0tUirmXCtDpsVHQVAXJdbfUXsOCs-0-84845a55e788e894986906bbb39e7fa3)
Here, .