Deep learning

Introduction to Bayes’ Theorem for beginners

Bayes’ Theorem is used to classification and prediction values from the given data sets. Which is known as Naive Bayes’ classifier. Naive Bayes’ classifier is mainly used in machine Learning.

Before going to learn Bayes’ theorem formula you should have little bit conceptual knowledge of Bayes’ theorem:

In probability theory , Bayes’ theorem explains the probability of most probable event, based on prior given conditions that might be related to the event. For example, if any disease is related to age, then, using Bayes’ theorem, a person’s age can be used to more accurately assess the probability that they have similar disease, compared to the judgement of the probability of disease made without knowledge of the person’s age.

One of the many application of Bayes’ theorem is used to update the  probability of hypothesis as more significant or information. This  is called Bayesian Inference. Baysian inference is important method in Mathematical Statistics. 

Bayesian Inference technique can be used almost all the fields like science, engineering, sports, medical, and etc.

Bayesian inference is closely related to Subjective Probability often called Bayesian Probability.

Subjective Probability: is a probability which is derived from personal judgments whether a particular outcome is likely to occur. It contains no formal calculation and only reflects the opinions and past experience. Subjective probability may differ fro person to person. e.g. If you ask a person when you flip a coin what is the probability of getting tail in 5 tosses and he answered 30%. That might be more or less if coin lands 4 times tails then he might change the probability to 80% or more.

Bayes’ theorem is stated mathematically as the following equation:

P(A|B) = P(B|A)*P(A)/P(B)


  • A and B are events and P(B)≠0.
  • P(A|B) is conditional probability: the likelihood of event A occurring given that B is true.
  • P(B|A) is also a conditional probability :the likelihood of event B occurring given that A is true.
  • P(A) and P(B) are probabilities of observing A and B independently of each other, this is known as marginal probability.

Marginal Probability: Marginal distribution is a subset of a collection of random variables is the probability distribution of the variables contained in a subset.It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables.

Real world example

Suppose that a person will get wet while walking down the lane. Let R be the random variable taking one from {wet, dry}, let S {for weather} be a discrete random variable taking one from {winter, summer, rainy}.

Here is dependent on S. That is P(R=wet) will take all the different values depending whether is winter, rainy or summer(likewise P(R= dry). A person, for example is far more likely to get wet when walking in rainy season than summer or winter. In other words for any given possible pair of values for R and S, one must consider the joint probability distribution of R and S to find the probability of that pair of events occurring together if the person ignores the state of the weather.

However, in trying to calculate the marginal probability P(R=wet), what we are asking for is the probability that R=wet in the situation in which we don’t actually know the particular value of S and in which the person ignores the state of the weather.

Lets take a mathematical problem to understand the workflow of Bayes’ theorem

Lets take a look into the steps how we will calculate the the normal probability which individual probability and conditional probability.

Step 1 : (A)Find Normal probabilities

Suppose we have 2 boxes A and B, then we have calculate the probability of selecting the boxes which P(A) and P(B)

It is simple since we have to select one box out two so

P(A) = 1/2 and P(B) = 1/2

What if they have given the individual probabilities

P(A) = 60%, P(B) = 30% and P(C) = 10%       [ P(A) = 0.6, P(B) = 0.3 and P(C) = 0.1]

Step 2 : (B) Find conditional probability 

represented as P(x|A) here x is selecting element from set/collection of A.

Suppose we have a box contains 5 red and 3 white balls. Calculate the probability of selecting red ball.

P(R|A) = (red ball)/(total number of balls) = 5/8

Bayes’ Formula      :      P(A|x) = P(x|A)*P(A)/[P(x|A).P(A)+p(x|B)*P(B)

In the above problem we have calculate the probability of selecting red ball from the box which is P(x|A) Then our problem statement will be P(A|x) just opposite. 

Here I am attaching the hand written solved example of Naive Bayes’ classifier example bayes theorem classifier example.

Featured image source

If you have any doubt please mention in comment section or shoot me an email @


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s