top of page
realcode4you

Application Of Probability In Machine Learning

Task 1

For the following exercises, work with the wine_flag_training and wine_flag_test data sets. Use either Python to solve each problem. Here are the links to the datasets:


import pandas as pd 
# import datasets for training and testing the naive bayes model
train_df = pd.read_csv("wine_flag_training.csv")
test_df = pd.read_csv("wine_flag_test.csv")

Task 2

Create two contingency tables, one with Type and Alcohol_flag and another with Type and Sugar_flag.

# create the contingency table for Type and Alcohol_flag
type_alcohol_table = pd.crosstab(test_df.Type, test_df.Alcohol_flag)
type_alcohol_table

Output:








# create the contingency table for Type and Sugar_flag
type_sugar_table = pd.crosstab(test_df.Type, test_df.Sugar_flag)
type_sugar_table

output:








Task 3

Use the tables in the previous exercise to carry out the following calculations:


# get all values from type - alcohol table
red_high_alcohol = type_alcohol_table.High.Red
red_low_alcohol = type_alcohol_table.Low.Red

white_high_alcohol = type_alcohol_table.High.White
white_low_alcohol = type_alcohol_table.Low.White

# get all values from type - sugar table
red_high_sugar = type_sugar_table.High.Red
red_low_sugar = type_sugar_table.Low.Red

white_high_sugar = type_sugar_table.High.White
white_low_sugar = type_sugar_table.Low.White



Task 3.1

The prior probability of Type = Red and Type = White.

# get the totals
total_red = red_high_alcohol + red_low_alcohol
total_white = white_high_alcohol + white_low_alcohol
total = red_high_alcohol + red_low_alcohol + white_high_alcohol + white_low_alcohol

# calculate the prior probabilities
prior_red = total_red / total
prior_white = total_white / total

print(f"Prior Probability of Type = Red: {round(prior_red, 2)}")
print(f"Prior Probability of Type = White: {round(prior_white, 2)}")

output:

Prior Probability of Type = Red: 0.25
Prior Probability of Type = White: 0.75

Task 3.2

The probability of high and low alcohol content.

# get the totals
total_high_alcohol = red_high_alcohol + white_high_alcohol
total_low_alcohol = red_low_alcohol + white_low_alcohol
total = red_high_alcohol + red_low_alcohol + white_high_alcohol + white_low_alcohol

# calculate the prior probabilities
prior_high_alcohol = total_high_alcohol / total
prior_low_alcohol = total_low_alcohol / total

print(f"Prior Probability of Alcohol_flag = High: {round(prior_high_alcohol, 3)}")
print(f"Prior Probability of Alcohol_flag = Low: {round(prior_low_alcohol, 3)}")

Task 3.3

The probability of high and low sugar content.

# get the totals
total_high_sugar = red_high_sugar + white_high_sugar
total_low_sugar = red_low_sugar + white_low_sugar
total = red_high_sugar + red_low_sugar + white_high_sugar + white_low_sugar

# calculate the prior probabilities
prior_high_sugar = total_high_sugar / total
prior_low_sugar = total_low_sugar / total
print(f"Prior Probability of Sugar_flag = High: {round(prior_high_sugar, 3)}")
print(f"Prior Probability of Sugar_flag = Low: {round(prior_low_sugar , 3)}")

output:

Prior Probability of Sugar_flag = High: 0.513
Prior Probability of Sugar_flag = Low: 0.487

Task 3.4

The conditional probabilities p(Alcohol_flag = High ∣ Type = Red) and p(Alcohol_flag = Low ∣ Type = Red).

prob_alcoholHigh_givenRed = red_high_alcohol / total_red
prob_alcoholLow_givenRed = red_low_alcohol / total_red

print(f"P(Alcohol_flag = High| Type = Red): {round(prob_alcoholHigh_givenRed, 3)}")
print(f"P(Alcohol_flag = Low| Type = Red): {round(prob_alcoholLow_givenRed , 3)}")

output:

Prior Probability of Sugar_flag = High: 0.513
Prior Probability of Sugar_flag = Low: 0.487

Task 3.5

The conditional probabilities p(Alcohol_flag = High ∣ Type = White) and p(Alcohol_flag = Low ∣ Type = White).


prob_alcoholHigh_givenWhite = white_high_alcohol / total_white
prob_alcoholLow_givenWhite = white_low_alcohol / total_white

print(f"P(Alcohol_flag = High| Type = White): {round(prob_alcoholHigh_givenWhite, 3)}")
print(f"P(Alcohol_flag = Low| Type = White): {round(prob_alcoholLow_givenWhite , 3)}")

output:

P(Alcohol_flag = High| Type = White): 0.516
P(Alcohol_flag = Low| Type = White): 0.484

Task 3.6

The conditional probabilities p(Sugar_flag = High ∣ Type = Red) and p(Sugar_flag = Low ∣ Type = Red)


prob_sugarHigh_givenRed = red_high_sugar / total_red
prob_sugarLow_givenRed = red_low_sugar / total_red

print(f"P(Sugar_flag = High| Type = Red): {round(prob_sugarHigh_givenRed, 3)}")
print(f"P(Sugar_flag = Low| Type = Red): {round(prob_sugarLow_givenRed , 3)}")

output:

P(Sugar_flag = High| Type = Red): 0.207
P(Sugar_flag = Low| Type = Red): 0.793

Task 3.7

The conditional probabilities p(Sugar_flag = High ∣ Type = White) and p(Sugar_flag = Low ∣ Type = White)


prob_sugarHigh_givenWhite = white_high_sugar / total_white
prob_sugarLow_givenWhite = white_low_sugar / total_white

print(f"P(Sugar_flag = High| Type = White): {round(prob_sugarHigh_givenWhite, 3)}")
print(f"P(Sugar_flag = Low| Type = White): {round(prob_sugarLow_givenWhite , 3)}")

output:

P(Sugar_flag = High| Type = White): 0.615
P(Sugar_flag = Low| Type = White): 0.385


Task 4 Use the probabilities in the previous exercise to discuss

  • How likely it is that a randomly selected wine is red.

  • How likely it is that a randomly selected wine has a high alcohol content.

  • How likely it is that a randomly selected wine has a low sugar content.

  • How likely is it that a randomly selected wine is Red? The prior probability value for Red wines calculated earlier (= 0.25) implies that we have about 1 in 4 chances of picking a Red wine when we make a selection. This is without any further evidence as to whether the wine's alcohol or sugar content levels.

  • How likely is it that a randomly selected wine has a *high alcohol content?* The value computed was 0.501, and it means that if a wine is selected at random from the samples, there will be approximateley a 50-50 chance of selecting wine that has a high alcoholic content, irrespective of whether the wine was Red, White, had Low or High sugar content.

  • How likely is it that a randomly selected wine has a *low sugar content?* The probability of selecting a low sugar wine is 0.487, and this is approximately a 50% chance if the alcohol content and wine type are ignored.

Task 5 Use the conditional probabilities found earlier to discuss

  • What a typical white wine might have as its alcohol and sugar content.

  • What a typical red wine might have as its alcohol and sugar content.

Generally, wines are made from fermented grapes. The fermentation process produces alcohol from the grape sugars. If the fermentation is stopped early, we have a wine with higher suger content, but less alcohol. On the other hand, a longer fermenting process leads to less sugar content, but more alcohol in the wine. From the conditional probabilities, we can say that more white wines were allowed to ferment, leading to the high alcohol and low sugar content observed. Generally, also, most red wines have a low sugar, high alcoholic content.

Task 6 Create side‐by‐side bar graphs for Type, one with an overlay of Alcohol_flag and the other with an overlay of Sugar_flag. Compare the graphs to the conditional probabilities you calculated


# import the graphing library
import matplotlib.pyplot as plt

# construct the data tables
type_alcohol = pd.crosstab(test_df.Type, test_df.Alcohol_flag, margins=True, margins_name='Total', normalize='index')
type_sugar = pd.crosstab(test_df.Type, test_df.Sugar_flag, margins=True, margins_name='Total', normalize='index')
# create figure and axis
fig, (axis1, axis2) = plt.subplots(nrows=1, ncols=2)

# align y-axes
axis1.set_ylim(0.0, 1.0)
axis2.set_ylim(0.0, 1.0)

# set titles
axis1.set_title('Alcohol_flag')
axis2.set_title('Sugar_flag')
axis1.set_ylabel('P(Alcohol_flag|Type)')
axis2.set_ylabel('P(Sugar_flag|Type)')

# adjust padding between plots
plt.subplots_adjust(wspace=0.5)

# set figure width and height
fig.set_figheight(5)
fig.set_figwidth(15)

# plot the graphs on each individual axes
plot_alcohol = type_alcohol.loc[['Red', 'White']].plot.bar(ax=axis1)
plot_sugar = type_sugar.loc[['Red', 'White']].plot.bar(ax=axis2)

output:



The height of each bar in the charts correspond to the conditional probabilities previously calculated


That is :

For the Alcohol_flag plot, the side-by-side bars for Red wine have the following heights:

  • Blue rectangle - P(Alcohol_flag = High| Type = Red): 0.456

  • Orange rectangle - P(Alcohol_flag = Low| Type = Red): 0.544

also, the side-by-side bars for the White wine have the following heights:

  • Blue rectangle - P(Alcohol_flag = High| Type = White): 0.516

  • Orange rectangle - P(Alcohol_flag = Low| Type = White): 0.484

For the Sugar_flag plot, the side-by-side bars for the Red wine have the following heights:

  • Blue rectangle - P(Sugar_flag = High| Type = Red): 0.207

  • Orange rectangle - P(Sugar_flag = Low| Type = Red): 0.793

Finally, the side-by-side bars for the White wine have the following heights:

  • Blue rectangle - P(Sugar_flag = High| Type = White): 0.615

  • Orange rectangle - P(Sugar_flag = Low| Type = White): 0.385


Task 7

Compute the posterior probability of Type = Red for a wine that is low in alcohol content and high in sugar content. Compute the posterior probability of Type = White for the same wine.


# use a contingency table to get the value of P(Type = Red| Alcohol_flag = Low AND Sugar_flag = High)
alcsug_type_table = pd.crosstab([test_df.Alcohol_flag, test_df.Sugar_flag], test_df.Type, margins=True, normalize='index')
alcsug_type_table

output:










Task 7.1

Compute the posterior probability of Type = Red for a wine that is low in alcohol content and high in sugar content.

# retrieve the value of P(Type = Red| Alcohol_flag = Low AND Sugar_flag = High)
p_red_given_lowAlc_highSug = alcsug_type_table.Red.Low.High
print(f"P(Type = Red| Alcohol_flag = Low AND Sugar_flag = High): {round(p_red_given_lowAlc_highSug, 2)}")

output:

P(Type = Red| Alcohol_flag = Low AND Sugar_flag = High): 0.08


Task 7.2

Compute the posterior probability of Type = White for a wine that is low in alcohol content and high in sugar content.

# retrieve the value of P(Type = White| Alcohol_flag = Low AND Sugar_flag = High)
p_white_given_lowAlc_highSug = alcsug_type_table.White.Low.High
print(f"P(Type = White| Alcohol_flag = Low AND Sugar_flag = High): {round(p_white_given_lowAlc_highSug, 2)}")

output:

P(Type = White| Alcohol_flag = Low AND Sugar_flag = High): 0.92

Task 8 Use your answers to the previous exercise to determine which type, red or white, is more probable for a wine with low alcohol and high sugar content. What would the Naïve Bayes classifier classify this wine as?

Since the posterior probability for White wine is more than that of Red wine (White is 0.92, Red is 0.08), White wine is more probable than Red wine. The Naive Bayes classifier would most likely classify the wine as White, though there might be some false conclusions from the classifier.

Task 9 Compute the posterior probability of Type = Red for a wine that is high in alcohol content and low in sugar content. Compute the posterior probability of Type = White for the same wine.


# use a contingency table to get the value of P(Type = Red| Alcohol_flag = Low AND Sugar_flag = High)
alcsug_type_table = pd.crosstab([test_df.Alcohol_flag, test_df.Sugar_flag], test_df.Type, margins=True, normalize='index')
alcsug_type_table

output:











Task 9.1

Compute the posterior probability of Type = Red for a wine that is high in alcohol content and low in sugar content.


# retrieve the value of P(Type = Red| Alcohol_flag = Low AND Sugar_flag = High)
p_red_given_highAlc_lowSug = alcsug_type_table.Red.High.Low
print(f"P(Type = Red| Alcohol_flag = High AND Sugar_flag = Low): {round(p_red_given_highAlc_lowSug, 2)}")

output:

P(Type = Red| Alcohol_flag = High AND Sugar_flag = Low): 0.31

Task 9.2

Compute the posterior probability of Type = White for a wine that is high in alcohol content and low in sugar content.

# retrieve the value of P(Type = White| Alcohol_flag = High AND Sugar_flag = Low)
p_white_given_highAlc_lowSug = alcsug_type_table.White.High.Low
print(f"P(Type = White| Alcohol_flag = High AND Sugar_flag = Low): {round(p_white_given_highAlc_lowSug, 2)}")

output:

P(Type = White| Alcohol_flag = High AND Sugar_flag = Low): 0.69



If you have any other problem related to machine learning probability then send your requirement details at:

Comentarios


bottom of page