top of page

What is Machine Learning and Apriori Algorithm? | Hire Machine Learning Expert

Machine Learning

  • Machine learning is a subset of artificial intelligence (AI)

  • Here, the goal, according to Arthur Samuel,1 is to give “computers the ability to learn without being explicitly programmed”

  • Tom Mitchell puts it more formally: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.”

  • Machine learning explores the use of algorithms that can learn from the data and use that knowledge to make predictions on data they have not seen before

  • Data-driven predictions or decisions through building a model from sample inputs.

Machine Learning Applications Examples

1- Self-driving Google car (now rebranded as WAYMO)

The car is taking a real view of the road to recognize objects and patterns such as sky, road signs, and moving vehicles in a different lane

The self-driving car needs not only to carry out such object recognition, but also to make decisions about navigation.

the car needs to know the rules of driving, have the ability to do object and pattern recognition, and apply these to making decisions in real time. In addition, it needs to keep improving. That is where machine learning comes into play.


2- Optical Character Recognition (OCR)

Humans are good with recognizing hand-written characters, but computers are not.


what we need is a basic set of rules that tells the computer what “A,”“a,”“5,” etc., look like, and then have it make a decision based on pattern recognition.


The way this happens is by showing several versions of a character to the computer so it learns that character, just like a child will do through repetitions, and then have it go through the recognition process



Machine Learning Applications Other Examples

  • Facebook uses machine learning to personalize each member’s news feed.

  • Most financial institutions use machine learning algorithms to detect fraud.

  • Intelligence agencies use machine learning to sift through mounds of information to look for credible threats of terrorism


Machine Learning features

  • In machine learning, a target is called a label.

  • A variable in statistics is called a feature in machine learning.

  • Machine learning algorithms are organized into a taxonomy, based on the desired out-come of the algorithm. Common algorithm types include:

- Supervised learning. When we know the labels on the training examples we

are using to learn.

- Unsupervised learning. When we do not know the labels (or even the

number of labels or classes) from the training examples we are using for

learning.

- Reinforcement learning. When we want to provide feedback to the system

based on how it performs with training examples. Robotics is a well-known

example.


Association Rules

Which of my products tend to be purchased together?

What do other people like this person tend to like/buy/watch?

- Discover "interesting" relationships among variables in a large database

- Rules of the form “If X is observed, then Y is also observed"

- The definition of "interesting“ varies with the algorithm used for discovery

Not a predictive method; finds similarities, relationships



Apriori Algorithm - What is it?

Support

Earliest of the association rule algorithms

Frequent itemset: a set of items L that appears together "often enough“:

  • Formally: meets a minimum support criterion

  • Support: the % of transactions that contain L

Apriori Property: Any subset of a frequent itemset is also frequent

It has at least the support of its superset


Confidence

Iteratively grow the frequent itemsets from size 1 to size K (or until we run out of support).

  • Apriori property tells us how to prune the search space

Frequent itemsets are used to find rules X->Y with a minimum confidence:

  • Confidence: The % of transactions that contain X, which also contain Y

Output: The set of all rules X -> Y with minimum support and confidence


Lift



Example on Association Rules, example

Support

  • Transaction1: {Apple, Juice, Rice, Chicken}

  • Transaction2: {Apple, Juice, Rice}

  • Transaction3: {Apple, Juice}

  • Transaction4: {Apple, Grapes}

  • Transaction5: {Milk, Juice, Rice, Chicken}

  • Transaction6: {Milk, Juice, Rice}

  • Transaction7: {Milk, Juice}

  • Transaction8: {Milk, Grapes}

Support(Apple) = 4/8


Confidence

How likely item Juice is purchased when item Apple is purchased, expressed as {Apple -> Juice}. This is measured by the proportion of transactions with item Apple, in which Juice also appears. In Table 1, the confidence of {apple -> Juice} is 3 out of 4, or 75%.


Confidence {Apple -> Juice} = Support {Apple, Juice}/ Support {Apple} 
(3/8)/ (4/8)
(3/4)

Lift

Measure 3: Lift. This example shows how likely Juice is purchased when apple is purchased, while controlling for how popular Juice is.


In Table 1, the lift of {apple -> Juice} is 1, which implies no strong association between items. A lift value greater than 1 means that item Juice is likely to be bought if apple is bought, while a value less than 1 means that Juice is unlikely to be bought if apple is bought.

Lift {Apple -> Juice} = Support {Apple, Juice}/ (Support {Apple} * (Support {Juice}) 

(3/8)/ ((4/8)*(6/8))
(3/4*(6/8)) = 1

Computing Confidence and Lift

Suppose we have 1000 credit records:

713 home_owners, 527 have good credit.

home_owner -> credit_good has confidence 527/713 = 74%


700 with good credit, 527 of them are home_owners

credit_good -> home_owner has confidence 527/700 = 75%


The lift of these two rules is


0.527 / (0.700*0.713) = 1.055



Finally: Find Confidence Rules

If we want confidence > 80%:

IF job_skilled THEN credit_good



Association Rules

  • First of all, you will need to download the weather data from Black Bord

  • Import the data from your computer to R

  • Set your working directory to the right place where your data is located

#R code

setwd(“~\\data\\")
weather <- read.csv("weather.csv")
weather

Building the model R: apriori

# install the “arules” Library
library(arules)
# find association rules with default settings
rules.all <- apriori(weather)
inspect(rules.all)
inspect(rules.all) 

# outputs

lhs rhs support confidence lift

[1] {Outlook=Overcast} => {Play.Soccer=Yes} 0.2857143 1.0000000 1.555556


If Outlook is Overcast then Play Soccer is yes with 100% confidence



# rules with rhs containing "Play or not" only

# rules with rhs containing "Play or not" only
rulesPlay = apriori(weather,  parameter = list(minlen=2, supp=0.005, conf=0.8), appearance = list(
 rhs=c(" Play.Soccer =No", " Play.Soccer =Yes"), default="lhs"))

inspect(rulesPlay)

# Improve rules quality and appearance (1)

quality(rulesPlay) <- round(quality(rulesPlay), digits=3)
rulesPlay.sorted <- sort(rulesPlay, by="lift")
inspect(rulesPlay.sorted)

Visualizing Association Rules

# Visualizing Association Rules
library(arulesViz)
plot(rulesPlay.pruned )
plot(rulesPlay.pruned , method="graph")

Complete Code

setwd("C:\\Users\\z10095\\Desktop\\data\\")
weather1 <- read.csv("weather.csv")
weather1
# install the "arules" Library
# install.packages("arules")
library(arules)
# find association rules with default settings
rules.all <- apriori(weather1)
inspect(rules.all)
# rules with rhs containing "Play or not" only
rulesPlay = apriori(weather1,  parameter = list(minlen=2, supp=0.005, conf=0.8), appearance = list(
  rhs=c("Play.golf=No", "Play.golf=Yes"), default="lhs"))
inspect(rulesPlay)
### Quality 
## for better comparison we sort the rules by confidence and add Bayado's improvement
quality(rulesPlay) <- round(quality(rulesPlay), digits=3)
rulesPlaySorted <- sort(rulesPlay, by = "lift")
inspect(rulesPlaySorted)
is.redundant(rulesPlaySorted)

## redundant rules
inspect(rulesPlaySorted[is.redundant(rulesPlaySorted)])
## non-redundant rules
inspect(rulesPlaySorted[!is.redundant(rulesPlaySorted)])
rulesPlay.pruned = rulesPlaySorted[!is.redundant(rulesPlaySorted)]
inspect(rulesPlay.pruned)

#  Visualizing Association Rules
#install.packages("arulesViz")
library(arulesViz)
plot(rulesPlay.pruned )
plot(rulesPlay.pruned , method="graph")


bottom of page