DBMS
Association and Sequencing. Market basket analysis can identify consumer purchasing patterns and other valuable relationships
DBMS, Data Mining Solutions Supplement

Selling as many different products as possible to your customers maximizes their value to your business. One way to accomplish this goal is to understand what products or services customers tend to purchase at the same time, or later on as follow-up purchases. Determining purchasing trends is a very common application of data mining, and association and sequencing techniques can perform this kind of analysis. Although originally devised for marketing purposes, these techniques also have important applications in medicine, finance, and other data rich environments where separate events might be related to each other and where knowing about such relationships can be valuable knowledge.

Association and sequencing tools analyze data to discover rules that identify patterns of behavior. An association tool will find rules such as:

When people buy diapers they also buy beer 50 percent of the time.

It is highly unlikely that this rule is true. In fact, the oft-cited correlation between sales of beer and diapers is probably a myth. However, it is convenient to use it for illustrative purposes.

A sequencing technique is very similar to an association technique, but it adds time to the analysis and produces rules such as:

People who have purchased a VCR are three times more likely to purchase a camcorder in the time period two to four months after the VCR was purchased.

Using an association or sequencing algorithm to find the kinds of rules we've just seen is frequently called market basket analysis. Because such analysis has become the primary use of the association technique, resulting it is frequently called a market basket technique.

Business managers or analysts can use a market basket analysis to plan:

Although most commonly used for market basket analysis, association and sequencing tools have useful applications in many other industries besides retail. Association and sequencing tools find patterns in transaction data, and many organizations capture transactional data. Understanding the patterns of behavior or activity can provide valuable insight.

In health care, there are possible applications in care management, procedure interactions and pharmaceutical interactions. Consider, for example, the following statements that might result from an analysis. Their possible application should be immediately obvious.

Patients who are taking drugs A, B, and C are two and a half times more likely to also be taking drug D.

Patients receiving procedure X from Doctor Y are three times less likely to get infection Z.

There are many applications of association and sequencing in the financial industry. As with the medical applications, an example is worth a thousand words. (And to someone it may even be worth a million bucks!)

The prices of stocks in industry Q are 1.8 times more likely to close up one day after stocks in industry R closed down.

Our final examples deal with fraud detection in telecommunications and insurance:

International credit card calls longer than three minutes originating in area code 555 between 1:00 AM and 3:00 AM are three times more likely to go uncollected.

Accident claims involving soft tissue trauma where attorney P represents the claimant are twice as likely to be fraudulent.

For clarity we have stated our example rules in plain English. Most association products produce rules in a more compact tabular form as is shown in Table 1. Some, such as IBMıs Intelligent Miner, can represent the rules in English (see Figure 1).


Conf Count Support
1. pet food
-->pet supplies
1.0000 192 0.001665
2. canned veg, cereal
-->dairy
0.7659 158 0.001370
3. cereal, produce
--> dairy
0.6683 135 0.001171
4. cereal
-->dairy
0.6680 1151 0.009983
5. beer, cereal
-->dairy
0.3427 112 0.000971

Table 1. Top five association rules by confidence, found by HyperParallelıs //Affinity.

Some important elements of association tools can be seen in the sample rules and illustrations weıve seen so far. Did you notice that our first rule about diapers and beer stated an explicit percentage (50 percent of the time) while the second rule used the comparative phrase three times more likely? In the second case the probability is compared to the baseline likelihood that is the probability of the event occurring independently. For example, if people normally buy beer 5 percent of the time, then the first rule could have said "10 times more likely." The ratio in this kind of comparison is called lift, and a key goal of an association or sequencing data mining exercise is to find rules that have the desired lift.

Table 1 shows five association rules found by //Affinity, the association component of //Discovery, the data mining product from HyperParallel Inc. For each rule, a confidence, a count of the number of instances and the support computed from this count, is reported. We'll explain the precise meanings of confidence and support in the next section. We will also return later to some implications that might be drawn from the particular rules in Table 1.

Terminology: Understanding the Rules

Each rule has a left-hand side (when people buy diapers) and a right-hand side (they also buy beer). Sometimes the left-hand side is called the antecedent and the right-hand side the consequent. In general, both the left-hand side and the right-hand side can contain multiple items, but for simplicity weıll stick with single items for now.

A rule has two measures, called confidence and support. Some products such, as MineSet from Silicon Graphics Inc., use the terms predictability and prevalence instead of confidence and support. To see what these terms mean and how they are computed, consider the following beer and diaper example:

500,000 transactions
20,000 transactions contain diapers (4 percent)
30,000 transactions contain beer (6 percent)
10,000 transactions contain both diapers and beer (2 percent)

Support (or prevalence) measures how often items occur together, as a percentage of the total transactions. In this example, beer and diapers occur together 2 percent of the time (10,000/500,000).

Confidence (or predictability) measures how much a particular item is dependent on another. Because 20,000 transactions contain diapers and 10,000 also contain beer, when people buy diapers, they also buy beer 50 percent of the time (a rule that matches our introductory example). The confidence for this rule is 50 percent.

The inverse rule, which would be stated as

When people buy beer they also buy diapers 1/3 of the time

has a confidence of 33.33 percent (computed as 10,000/30,000).

Note that these two rules have the same support (2 percent as computed above). Support is not dependent on the direction (or implication) of the rule; it is only dependent on the set of items in the rule.

In the absence of any knowledge about what else was bought, we can also make the following assertions from the available data:

People buy diapers 4 percent of the time.
People buy beer 6 percent of the time.

These numbers -- 4 percent and 6 percent -- are called the expected confidence (previously referred to as baseline likelihood) of buying diapers or beer, regardless of what else is purchased. Lift measures the difference between the confidence of a rule and the expected confidence. You can measure this difference either by subtracting the two values or, more commonly, by putting them a ratio.

Lift is one measure of the strength of an effect. If we found a rule that said that people who bought diapers also bought beer 8 percent of the time, then the effect of buying diapers on beer-buying behavior is fairly small if the expected confidence is 6 percent. If, on the other hand, the confidence is 50 percent, and lift is more than eight times (when measured as a ratio), then the interactions between diapers and beer is very strong.

The above rules were all based on item sets with two items. Now letıs consider item sets with three items, adding the following to our example:

10,000 transactions contain wipes
8,000 transactions contain wipes and diapers (80 percent)
220 transactions contain wipes and beer (2.2 percent)
200 transactions contain wipes, diapers and beer (2 percent)

The complete set of 12 rules is presented in Table 2 along with their confidence, support and lift. There are six possible rules for item sets of three items (rules 7-12 in Table 2). In addition, there are four additional two-item rules (rules 3-6 in Table 2).


Left-hand sideRight-hand sideExpected Conf (percent)Confidence (percent)Lift RatioSupport (percent)
1 Diapers --> Beer 6.00 50.00 8.33 2.00
2 Beer --> Diapers 4.00 33.33 8.33 2.00
3 Diapers --> Wipes 2.00 40.00 20.00 1.60
4 Wipes --> Diapers 4.00 80.00 20.00 1.60
5 Wipes --> Beer 6.00 2.200.37 0.04
6 Beer --> Wipes 2.00 0.73 0.37 0.04
7 Diapers and Wipes --> Beer 6.00 2.50 0.42 0.04
8 Diapers and Beer --> Wipes 2.00 2.00 1.00 0.04
9 Wipes and Beer --> Diapers 4.00 90.91 22.73 0.04
10 Diapers --> Wipes and Beer 0.044 1.00 22.73 0.04
11 Wipes --> Diapers and Beer 2.00 2.00 1.00 0.04
12 Beer --> Diapers and Wipes 1.60 0.67 0.42 0.04

Table 2. A complete set of two-item and three-item rules for the beer and diapers example.

The greatest amount of lift, if measured as a ratio, is found in the ninth and tenth rules, which both have a lift greater than 22, computed as 90.91/4 and 1/0.044. Looking at the ninth rule, the lift of 22 means that people who purchase wipes and beer are 22 times more likely to also purchase diapers than people who do not purchase wipes and beer. Also, note the negative lift (or a lift ratio of less than 1) in the fifth, sixth, seventh and last rules. The latter two rules both have a lift ratio of approximately 0.42. We can interpret the negative lift on the seventh rule to mean that people who buy diapers and wipes are less likely to buy beer than one would ordinarily expect (that is, in the absence of diapers and wipes). It is not coincidental, but might seem somewhat counterintuitive, that the lift of a rule and its inverse rule are the same. One can easily prove that they will always be the same from the formula that computes lift, an exercise we leave for the interested reader.

While one would be most interested in rules with very high or very low confidence, sometimes such rules model an anomaly. Letıs take another look at the rules in Table 1. The first rule, with a confidence of 1 (100 percent), says that whenever people bought pet food they also bought pet supplies. A confidence of 100 percent seems highly unlikely, as it means that no one bought pet food without also buying pet supplies, but further investigation might show that the data was for one day only, and that there was a special giveaway on that day, so anyone who bought pet food was given a free pet toy. The other rules, because they all have dairy on the right hand side, may also warrant another look. Because milk or eggs are so commonly purchased, "dairy" is quite likely to show up in many rules. However, because it is such a common ingredient in a market basket, rules involving dairy might not be practical. For this reason, the ability to exclude specific items can be very useful in an association tool.

In general, analysts are looking for rules that have a very high or very low lift, that have support that exceeds a threshold, and that do not involve items that appear on most transactions. Rules with a low value for support might simply be due to a statistical anomaly. Even if they are accurate rules, acting on rules with little support might not lead to much gain because a low level of support suggests that the event occurs infrequently. Increasing your profit by $100 on something that occurs three times a year is probably not worth doing.

Preparing Data

As with all data, the data used by an association algorithm is made up of entities and attributes. The entity might be a market basket, and the attributes all the items purchased at one time. Or the entity might be a patient, and the attributes the doctors, drugs, procedures, and outcomes associated with the patient.

The data that is used by an association algorithm needs to be in one of two formats that we'll call horizontal and vertical. In the horizontal format (see Table 3) there is one row for each entity, and there are columns for each attribute. For example, analyzing the effectiveness of a medical procedure would require one row for each patient, with separate columns for each drug, doctor and outcome recorded. For market basket analysis with the horizontal format, there is one row for each market basket, with columns for each (type of) product. Integral Solutions Ltd.ıs data mining product, Clementine, has an association algorithm that uses the horizontal format.

A significant problem for the horizontal format is that the number of columns can become quite large. For market basket analysis, where the number of products might exceed 100,000, similar products need to be grouped together to reduce the number of columns to a reasonable quantity. Another problem with the horizontal format is that the schema is data dependent. When a new product is added to the market basket analysis, or when products are categorized in a different way, then the schema needs to be changed to add or reorganize columns.


IDdiapersbeerwipesdairycereal
132 Y     Y 
428  Y Y Y 

Table 3. Data in horizontal format.

The vertical format (see Table 4), which is more commonly used by the data mining products, eliminates these problems by using multiple rows to store an entity, using one row for each attribute. The rows for a particular entity (that is, a market basket or a patient) are tied together with a common ID. This kind of representation is more normalized in the relational sense, and it works much better when an entity can have great variability in terms of the number of attributes. For example, some people check out of the supermarket with only two or three items when others fill two carts with hundreds of items.


ID Product
132 diapers
132 dairy
428 beer
428 wipes
428 dairy

Table 4. Data in vertical format.

Because data may already exist in one format or the other, some of the products, such as IBM's Intelligent Miner, support a pivoting operation that converts a horizontal format to a vertical format.

Association algorithms can only operate on categorical data. If you use noncategorical attributes, such as income of the purchaser, the noncategorical data must be binned into ranges (for example, 0 to 20,000; 20,001 to 40,000; 40,001 to 70,000; and greater than 70,001), turning each range into an attribute.

Another common characteristic of association rule generators is an item hierarchy. A product hierarchy can be used to group similar items together. In our example we were using generic terms: diapers, wipes, and beer. In reality, stores sell specific items (stock keeping units, or SKUs): Coors beer in a 12oz six-pack, Bud Lite 16oz cans in a case, etc. While sometimes one might be interested in rules that deal in specific items, rules that are more general are frequently desirable, such as the rules we have been using as examples. Or we might be interested in rules that differentiate between diapers sold in boxes vs. diapers sold in bulk.

Algorithms at Work

At its heart, an association or sequencing algorithm is simply a counting algorithm, with the final probabilities computed by taking ratios among various counts. If item hierarchies are in use, then some translation (or lookup) is needed. In either case you must carefully control the sizes of the item sets under consideration because the combinatorial explosion problem comes into play.

Large grocery stores stock upwards of 100,000 different items. This means that there are approximately 5.0 x 109 (5 billion!) possible item pairs, and 1.7 x 1014 sets of three items. You should use an item hierarchy to reduce this number to a manageable size. In a grocery store application there is unlikely to be a specific relationship between, say, Pampers in the 30-count box and Pabst Blue Ribbon in 12oz cans. If there is such a relationship, it is probably subsumed by the more general relationship between diapers and beer.

Using an item hierarchy reduces the number of combinations and also helps to find more general, and probably more useful, higher-level relationships such as those between any kind of diapers and any kind of beer. Unfortunately, even if you use an item hierarchy to group items together so that the average group size is 50 (reducing 100,000 items to 2,000 item groups), the combinatorial explosion will still raise its ugly head. With 2,000 item groups there are still almost 2 million paired item sets. An algorithm might require up to 2 million counting registers. And there are 1.3 billion three-item item sets! Six-item item sets, should you want to consider them, will potentially require 9 x 1016 counters.

What we have been discussing is the potential number of combinations. In most cases, particularly in a supermarket, many of the combinations will never occur. This means that an algorithm will not need nearly as many counters as was implied by the analysis above. Nevertheless, some sort of dynamic memory or counter allocation and addressing scheme will be needed.

Some data mining products produce summary statistics describing the transactions that have been analyzed. Figure 2 is output generated by DecisionAR, a data mining product from NeoVista Software Inc. It reports the number of two-item, three-item, and so on, up to five-item item sets found from a file with 191,346 instances.

As an aside, how many combinations would there be in a typical transaction? Knowing this number will give us an idea of how many counters need to be updated for each transaction. Lets assume that the typical market basket contains 30 items, and that some of these are in the same item group (for example, two kinds of cheeses), reducing the number of item groups to 25. Now we use the formula that tells us how many combinations there are of "n things taken m at a time." Just for old time's sake, here is the formula:

n!
------------
m!(n-m)!

Table 5 gives the number of combinations for 25 things taken two, three, and so on at a time.


2 300
3 2,300
4 12,650
5 53,130
6 177,100
7 480,700
8 1,081,575

Table 5. The number of combinations of 25 things taken two three and so on at a time.

So if we are interested in item sets of up to six, there will be 245,480 (the sum of the first five entries in the right-hand column in Table 5) combinations to analyze in each transaction. If weıre looking at a million or so transactions, you can see that some care is needed in the algorithm to ensure reasonable performance.

Sequencing

Sequencing extends association by adding time comparisons between transactions. Sequencing requires a primary ID, like a customer ID, which relates transactions that occurred at different times. By taking pairwise combinations of all transactions that have the same primary ID and computing the time difference between each pair, the algorithm identifies all before-and-after item pairs.

Because time differences are continuous rather than categorical values, you must bin or group the continuous values into categorical values based on the time-related objectives set by the user. Examples of time-related objectives are "at any later time," "within six months," "next visit," or a set of mutually exclusive ranges such as next day, next week, next month, and next year. Some products only support "at any later time." Others such as //Discovery from HyperParallel or Decision Series from NeoVista support a wide range of time-related objectives.

Note that taking pairwise combinations again brings combinatorial explosion into play. If a particular primary ID for a specific customer has five transactions, then there will be 10 paired combinations (five things taken two at a time).

Consider a regular customer at a supermarket. If he or she makes weekly visits to the market, and we want to do a sequencing analysis on a yearıs worth of data (say 50 transactions with two weeks off for vacation), weıd better proceed cautiously. For just one such regular customer there will be 1,225 paired combinations! If we only consider "next visit" pairings, that reduces the number of combinations to 49. (The number of "next visit" pairings is always one less than the number of transactions.) This analysis makes more sense for a supermarket. Weıre much more likely to find interesting patterns from one week to the next than we are to find patterns relating Januaryıs market basket to Julyıs.

In any case, our point is that an appropriate time-related objective can (and should) be used to reduce the number of combinations. This is important because once the transactions have been combined, the sequencing algorithm is identical to the association algorithm with some additional restrictions. And, as we saw above, the association algorithm is already laced with combinatorial issues; the time-related pairing just adds another layer of combinations!

We wonıt go into further detail here, except to point out to those who have a deeper interest the following restrictions that when applied to the association algorithm turns it into a sequencing algorithm.

Interpreting the Output

Most association and sequencing products present results as rules, as shown in Table 1 and Figure 1. In addition, they often isolate specific rules of interest for examination. Some permit rules to be sorted by lift, confidence, support, left-hand side, or right-hand side. Others enable selection of a subset of the rules based on specified item, item group, or thresholds for confidence or support. In some products, users specify selection criteria before executing the algorithm. In others all rules are generated and then use parameters to select a subset for presentation. For problems that have a large amount of data, the first method might be more useful if you wish to reduce the amount of time needed to run the algorithm. On the other hand, if the first method is in use and you later decide to change the constraints, youıll have to wait while the algorithm finds new rules for you. If all rules are computed at once, interactive exploration of the rule set will be much easier to do.

MineSet from Silicon Graphics is the only product we know of that presents rules in a graphic way, integrating data mining and data visualization. MineSet uses a 3D bar chart (see Figure 3) to illustrate support, confidence, and lift in one chart. In this chart, color is used to show support (called prevalence in MineSet), the height of each bar corresponds to confidence (called predictability in MineSet), and the disc on each bar shows the expected confidence. Carbonated beverages, as the right-hand side of a rule with bread and baked goods on the left-hand side, appears to have a high relative level of support. The lift (how far the bar extends above the disk) is also noteworthy. Note that the chart is limited to rules that apply to only two items, and that have only positive lift. Nevertheless, the graph shows a lot of information in a very compact way.

All Together Now

Association and sequencing techniques can help companies identify groups of products or services that customers have already demonstrated a tendency to acquire together, or in subsequent purchases. The infamous but probably apocryphal association between beer and diaper purchases is a good example of a retail application of association and sequencing. But detecting and assessing relevant patterns can benefit almost any business that accumulates large volumes of transactions.


See Association and Sequencing Products for more information.


Figure 1. IBMıs Intelligent Miner is an example of a data mining tool that represents association rules in English sentences.


file - basket.dat 191346 87270 18
Total no. of transactions with only 1 item = 67764
ı
Total no. doubles = 93
Total no. 3-tuples = 141
Total no. 4-tuples = 84
Total no. 5-tuples = 16

Compute Rules Elapsed Time= 0.142 CPU Time= 0.059 Total Time= 38.766

Figure 2. NeoVistaıs DecisionAR produces a log summarizing its association analysis.


Figure 3. MineSet from Silicon Graphics supports visualization of rules in three-dimensional space.


Estelle Brand (estelle@xore.com) and Rob Gerritsen (rob@xore.com) are founders of Exclusive Ore Inc., based in Blue Bell, Pennsylvania, which is a consulting and training company specializing in data mining. During the last two years they have used more than a dozen data mining products. Their database management systems experience dates back to the dark ages. For more information about Exclusive Ore and data mining, see www.xore.com.
What did you think of this article? Send a letter to the editor.


Subscribe to DBMS -- It's free for qualified readers in the United States
Data Mining Solutions Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS (http://www.dbmsmag.com)
Copyright © 1998 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated February 26, 1998