Easy Data Mining - Part 3
Association Rules
I promiss I won't discuss here about the "beer and diapers" story. I'll just talk about what a basic association rule is
and lay down the fundation for the methodes and algoritms used to find the associations. What is Association Rule mining? It is a way to find interesting associations among large sets of data items.
So we have determined the frequent itemsets based on a predefined support. We have all the riders that are often sold together.
This information is itself valuable but we need a little bit more. We need to know what is the driver within a frequent itemset. What are the riders that,
once bought, can drive the customers to buy other riders? An insurance agent would say that a customer who buys a Disability Income Benefit,
most of the time will buy a Waiver of Premium for Disability Benefit as well. Or a customer who buys a Accidental Death Benefit, most of the time will also
buy a Dismemberment Benefit. But what if there are some unknown facts that can make the customers to buy riders that apparently don't have any relationship?
This is where finding the association rules becomes useful.
We need to find all the situations where customers who bought a subset of a frequent itemset, most of the time also bought the remaining items in
the same frequent itemset. Most of the times? What does that mean? Again, just like we did for the support of frequent itemsets,
we have to set a threshold, a percentage that will help us considering a rule useful. Let's say the percentage is 80%.
Given a frequent itemset, say (S1, S2, S3), if a customer who buys a subsert formed by S1 and S2, also buys S3 80% of the times,
then it is worth to consider the rule. This percentage is called the confidence of the rule and is defined as the ratio of the number of
transactions that include all items in a particular frequent itemset to the number of transactions that include all items in the subset.
Let's consider the same insurance example below. We want to find the association rules that meet the following requirements:
- Support - 30% - Only the riders that are bought together by at least 3 customers are considered.
.
- Confidence - 90% - The association rule has to be true in 90% of the transactions
10 customers with the riders that they bought
We'll consider (S1, S2, S3) frequent itemset. It was bought by 3 customers out of 10, so it meets the support requirement.
Let's look at some association rules (there are 6 possible associations: 2`3 -2) that can be generated from this itemsets and calculate the confidence:
(S1, S3) -> (S2)
(S1, S3) was bought by 5 customers but only 3 of them also bought S2. Confidence is 60%.
(S2, S3) -> (S1)
(S2, S3) was bought by 6 customers but only 3 of them also bought S1. Confidence is 50%.
(S1, S2) -> (S3)
(S1, S2) was bought by 3 customers and all 3 of them bought S3 as well. Confidence is 100%. So this rule has a very strong confidence (above 90%)
and has to be considered.
So the calculation formula for confidence is:
Confidence((Sk, Sk+1) -> (Sk+2)) = [Support(Sk, Sk+1, Sk+2) / Support(Sk, Sk+1)] * 100
Well, obviously, we can't take all the frequent itemsets, go through all the possible combinations and pick the association rules that satisfy the confidence.
There are algorithms created to prune the associations that don't meet the criteria. These algorithms are based on a theorem that is very
easy to demonstrate and it is, in a way, similar with the Apriori property explained in the previous article.
Confidence-Based Pruning Theorem
Say we have a frequent itemset F and 2 subsets S1 and S2 of this itemset where S2 is included in S1.
If the association rule S1 -> F-S1 does not satisfy the minimum confidence MC, then we can say for sure that S2 -> F-S2 won't satisfy it either.
Here's an example - Let's say (S1, S2, S3, S4) is frequent. If (S1, S2, S3) -> (S4) doesn't have the required confidence, then there is no need to check
any of the associations that are built on subsets of (S1, S2, S3), like (S1, S2) -> (S3, S4) or (S2, S3) -> (S1, S4). Let's try now to demonstrate this theorem.
C1 = Confidence ( S1 -> F-S1 ) = [Support(F) / Support(S1)] * 100
C2 = Confidence ( S2 -> F-S2 ) = [Support(F) / Support(S2)] * 100
Since S2 is included in S1, Support (S2) >= Support (S1) and therefore C1 >= C2. And since C1 < MC (S1 -> F-S1 does not satisfy the minimum confidence),
then C2 < MC (S2 -> F-S2 won't satisfy the minimum confidence either). q.e.d.
We'll discuss in detail the algorithms that use this theorem but for now I think we have a fair idea about what an association rule is and what role it
has in data mining.
Reference
Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar
Introduction to Data Mining by Tan, Steinbach, Kumar
|