Expected Values

The place to start with chi-square is to lay data out in a table, as in Table 5.5. This is a simple 2 x 2 table, which represents a test group and a control group in a test that has two outcomes, say response and nonresponse. This table also shows the total values for each column and row that is, the total number of responders and nonresponders each column and the total number in the test and control groups each row . The response column is added for reference it is not part of the calculation....

How Good Is an Association Rule

Occurrence Table Visualized Cube

Association rules start with transactions containing one or more products or service offerings and some rudimentary information about the transaction. For the purpose of analysis, the products and service offerings are called items. Table 9.1 illustrates five transactions in a grocery store that carries five products. These transactions have been simplified to include only the items purchased. How to use information like the date and time and whether the customer paid with cash or a credit card...

Matching Campaigns to Customers

The same response model scores that are used to optimize the budget for a mailing to prospects are even more useful with existing customers where they can be used to tailor the mix of marketing messages that a company directs to its existing customers. Marketing does not stop once customers have been acquired. There are cross-sell campaigns, up-sell campaigns, usage stimulation campaigns, loyalty programs, and so on. These campaigns can be thought of as competing for access to customers. When...

Reducing Exposure to Credit Risk

Learning to avoid bad customers and noticing when good customers are about to turn bad is as important as holding on to good customers. Most companies whose business exposes them to consumer credit risk do credit screening of customers as part of the acquisition process, but risk modeling does not end once the customer has been acquired. Assessing the credit risk on existing customers is a problem for any business that provides a service that customers pay for in arrears. There is always the...

Tree Ring Diagrams

Another clever representation of a decision tree is used by the Enterprise Miner product from SAS Institute. The diagram in Figure 6.15 looks as though the tree has been cut down and we are looking at the stump. Figure 6.14 Often a simple line or curve cannot separate the regions and a decision tree does better. Figure 6.14 Often a simple line or curve cannot separate the regions and a decision tree does better. Figure 6.15 A tree ring diagram produced by SAS Enterprise Miner summarizes the...

ChiSquare Test 1

As described in Chapter 5, the chi-square X2 test is a test of statistical significance developed by the English statistician Karl Pearson in 1900. Chi-square is defined as the sum of the squares of the standardized differences between the expected and observed frequencies of some occurrence between multiple disjoint samples. In other words, the test is a measure of the probability that an observed difference between samples is due only to chance. When used to measure the purity of decision...

Seven Bridges of Knigsberg

One of the earliest problems in graph theory originated with a simple challenge posed in the eighteenth century by the Swiss mathematician Leonhard Euler. As shown in the simple map in Figure 10.4, K nigsberg had two islands in the Pregel River connected to each other and to the rest of the city by a total of seven bridges. On either side of the river or on the islands, it is possible to get to any of the bridges. Figure 10.4 shows one path through the town that crosses over five bridges...

Validate Assumptions

Using simple cross-tabulation and visualization tools such as scatter plots, bar graphs, and maps, validate assumptions about the data. Look at the target variable in relation to various other variables to see such things as response by channel or churn rate by market or income by sex. Where possible, try to match reported summary numbers by reconstructing them directly from the base-level data. For example, if reported monthly churn is 2 percent, count up the number of customers that cancel...

Data Mining to Choose the Right Place to Advertise

One way of targeting prospects is to look for people who resemble current customers. For instance, through surveys, one nationwide publication determined that its readers have the following characteristics 59 percent of readers are college educated. n 46 percent have professional or executive occupations. n 21 percent have household income in excess of 75,000 year. n 7 percent have household income in excess of 100,000 year. Understanding this profile helps the publication in two ways First, by...

Prospecting

Prospecting seems an excellent place to begin a discussion of business applications of data mining. After all, the primary definition of the verb to prospect comes from traditional mining, where it means to explore for mineral deposits or oil. As a noun, a prospect is something with possibilities, evoking images of oil fields to be pumped and mineral deposits to be mined. In marketing, a prospect is someone who might reasonably be expected to become a customer if approached in the right way....

Comparing Models Using Lift

Directed models, whether created using neural networks, decision trees, genetic algorithms, or Ouija boards, are all created to accomplish some task. Why not judge them on their ability to classify, estimate, and predict The most common way to compare the performance of classification models is to use a ratio called lift. This measure can be adapted to compare models designed for other tasks as well. What lift actually measures is the change in concentration of a particular class when the model...

Affinity Grouping or Association Rules

The task of affinity grouping is to determine which things go together. The prototypical example is determining what things go together in a shopping cart at the supermarket, the task at the heart of market basket analysis. Retail chains can use affinity grouping to plan the arrangement of items on store shelves or in a catalog so that items often purchased together will be seen together. Affinity grouping can also be used to identify cross-selling opportunities and to design attractive...

Directed Graphs

The graphs discussed so far are undirected. In undirected graphs, the edges are like expressways between nodes they go in both directions. In a directed graph, the edges are like one-way roads. An edge going from A to B is distinct from an edge going from B to A. A directed edge from A to B is an outgoing edge of A and an incoming edge of B. Directed graphs are a powerful way of representing data Flight segments that connect a set of cities n Hyperlinks between Web pages n Telephone calling...

Market Basket Analysis and Association Rules

Market Basket Analysis Association Rules

To convey the fundamental ideas of market basket analysis, start with the image of the shopping cart in Figure 9.1 filled with various products purchased by someone on a quick trip to the supermarket. This basket contains an assortment of products orange juice, bananas, soft drink, window cleaner, and detergent. One basket tells us about what one customer purchased at one time. A complete list of purchases made by all customers provides much more information it describes the most important part...

Standard Error of a Proportion

The approach to answering this question uses the idea of a confidence interval. The challenger offer, in the above scenario, is being sent to a random subset of customers. Based on the response in this subset, what is the expected response for this offer for the entire population For instance, let's assume that 50,000 people in the original population would have responded to the challenger offer if they had received it. Then about 5,000 would be expected to respond in the 10 percent of the...

Determining Customer Value

Customer value calculations are quite complex and although data mining has a role to play, customer value calculations are largely a matter of getting financial definitions right. A seemingly simple statement of customer value is the total revenue due to the customer minus the total cost of maintaining the customer. But how much revenue should be attributed to a customer Is it what he or she has spent in total to date What he or she spent this month What we expect him or her to spend over the...

What Is a Neural Net

Neural networks consist of basic units that mimic, in a simplified fashion, the behavior of biological neurons found in nature, whether comprising the brain of a human or of a frog. It has been claimed, for example, that there is a unit within the visual system of a frog that fires in response to fly-like movements, and that there is another unit that fires in response to things about the size of a fly. These two units are connected to a neuron that fires when the combined value of these two...

Different Kinds of Churn

Actually, the discussion of why churn matters assumes that churn is voluntary. Customers, of their own free will, decide to take their business elsewhere. This type of attrition, known as voluntary churn, is actually only one of three possibilities. The other two are involuntary churn and expected churn. Involuntary churn, also known as forced attrition, occurs when the company, rather than the customer, terminates the relationship most commonly due to unpaid bills. Expected churn occurs when...

Simulating the Future

This discussion is largely based on discussions with Marc Goodman and on his 1995 doctoral dissertation on a technique called projective visualization. Projective visualization uses a database of snapshots of historical data to develop a simulator. The simulation can be run to project the values of all variables into the future. The result is an extended database whose new records have exactly the same fields as the original, but with values supplied by the simulator rather than by observation....

Comparison of ChiSquare to Difference of Proportions

Chi-square and difference of proportions can be applied to the same problems. Although the results are not exactly the same, the results are similar enough for comfort. Earlier, in Table 5.4, we determined the likelihood of champion and challenger results being the same using the difference of proportions method for a range of champion response rates. Table 5.7 repeats this using the chi-square calculation instead of the difference of proportions. The results from the chi-square test are very...

The Virtuous Cycle of Data Mining

In the first part of the nineteenth century, textile mills were the industrial success stories. These mills sprang up in the growing towns and cities along rivers in England and New England to harness hydropower. Water, running over water wheels, drove spinning, knitting, and weaving machines. For a century, the symbol of the industrial revolution was water driving textile machines. The business world has changed. Old mill towns are now quaint historical curiosities. Long mill buildings...

Information Gain Ratio

The entropy split measure can run into trouble when combined with a splitting methodology that handles categorical input variables by creating a separate branch for each value. This was the case for ID3, a decision tree tool developed by Australian researcher J. Ross Quinlan in the nineteen-eighties, that became part of several commercial data mining software packages. The problem is that just by breaking the larger data set into many small subsets , the number of classes represented in each...

A Wireless Communications Company Makes the Right Connections

The wireless communications industry is fiercely competitive. Wireless phone companies are constantly dreaming up new ways to steal customers from their competitors and to keep their own customers loyal. The basic service offering is a commodity, with thin margins and little basis for product differentiation, so phone companies think of novel ways to attract new customers. This case study talks about how one mobile phone provider used data mining to improve its ability to recognize customers...

A Supermarket Becomes an Information Broker

Thanks to point-of-sale scanners that record every item purchased and loyalty card programs that link those purchases to individual customers, supermarkets are in a position to notice a lot about their customers these days. Safeway was one of the first U.S. supermarket chains to take advantage of this technology to turn itself into an information broker. Safeway purchases address and demographic data directly from its customers by offering them discounts in return for using loyalty cards when...

Learning Things That Are True but Not Useful

Although not as dangerous as learning things that aren't true, learning things that aren't useful is more common. Figure 3.2 Did sales drop off in October Learning Things That Are Already Known Data mining should provide new information. Many of the strongest patterns in data represent things that are already known. People over retirement age tend not to respond to offers for retirement savings plans. People who live where there is no home delivery do not become newspaper subscribers. Even...

The C5 Pruning Algorithm

C5 is the most recent version of the decision-tree algorithm that Australian researcher, J. Ross Quinlan has been evolving and refining for many years. An earlier version, ID3, published in 1986, was very influential in the field of machine learning and its successors are used in several commercial data mining products. The name ID3 stands for Iterative Dichotomiser 3. We have not heard an explanation for the name C5, but we can guess that Professor Quinlan's background is mathematics rather...

A Case Study in Business Data Mining

Once upon a time, there was a bank that had a business problem. One particular line of business, home equity lines of credit, was failing to attract good customers. There are several ways that a bank can attack this problem. The bank could, for instance, lower interest rates on home equity loans. This would bring in more customers and increase market share at the expense of lowered margins. Existing customers might switch to the lower rates, further depressing margins. Even worse, assuming that...

What Is the Virtuous Cycle

Virtuous Cycle Data Mining

The BofA example shows the virtuous cycle of data mining in practice. Figure 2.1 shows the four stages 1. Identifying the business problem. 2. Mining data to transform the data into actionable information. of the efforts to complete the learning cycle. of the efforts to complete the learning cycle. Figure 2.1 The virtuous cycle of data mining focuses on business results, rather than just exploiting advanced techniques. As these steps suggest, the key to success is incorporating data mining into...

Choosing a Communication Channel

Prospecting requires communication. Broadly speaking, companies intentionally communicate with prospects in several ways. One way is through public relations, which refers to encouraging media to cover stories about the company and spreading positive messages by word of mouth. Although highly effective for some companies such as Starbucks and Tupperware , public relations are not directed marketing messages. Of more interest to us are advertising and direct marketing. Advertising can mean...

The CART Pruning Algorithm

CART is a popular decision tree algorithm first published by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone in 1984. The acronym stands for Classification and Regression Trees. The CART algorithm grows binary trees and continues splitting as long as new splits can be found that increase purity. As illustrated in Figure 6.6, inside a complex tree, there are many simpler subtrees, each of which represents a different trade-off between model complexity and training set...

Lessons Learned

Data Mining is an important component of analytic customer relationship management. The goal of analytic customer relationship management is to recreate, to the extent possible, the intimate, learning relationship that a wellrun small business enjoys with its customers. A company's interactions with its customers generates large volumes of data. This data is initially captured in transaction processing systems such as automatic teller machines, telephone switch records, and supermarket scanner...

Hypothesis Testing

Hypothesis testing is the simplest approach to integrating data into a company's decision-making processes. The purpose of hypothesis testing is to substantiate or disprove preconceived ideas, and it is a part of almost all data mining endeavors. Data miners often bounce back and forth between approaches, first thinking up possible explanations for observed behavior often with the help of business experts and letting those hypotheses dictate the data to be analyzed. Then, letting the data...

Why Churn Matters

Churn is important because lost customers must be replaced by new customers, and new customers are expensive to acquire and generally generate less revenue in the near term than established customers. This is especially true in mature industries where the market is fairly saturated anyone likely to want the product or service probably already has it from somewhere, so the main source of new customers is people leaving a competitor. Figure 4.6 illustrates that as the market becomes saturated and...

What Does a Data Mining Problem Look Like

To translate a business problem into a data mining problem, it should be reformulated as one of the six data mining tasks introduced in Chapter One Classification n Estimation n Prediction n Affinity Grouping n Clustering These are the tasks that can be accomplished with the data mining techniques described in this book though no single data mining tool or technique is equally applicable to all tasks . The first three tasks, classification, estimation, and prediction are examples of directed...

What Tasks Can Be Performed with Data Mining

Many problems of intellectual, economic, and business interest can be phrased in terms of the following six tasks Classification n Estimation n Prediction n Affinity grouping n Clustering The first three are all examples of directed data mining, where the goal is to find the value of a particular target variable. Affinity grouping and clustering are undirected tasks where the goal is to uncover structure in data without respect to a particular target variable. Profiling is a descriptive task...

Differential Response Analysis

The way out of this dilemma is to directly model the actual goal of the campaign, which is not simply reaching prospects who then make purchases. The goal should be reaching prospects who are more likely to make purchases because of having been contacted. This is known as differential response analysis. Differential response analysis starts with a treated group and a control group. If the treatment has the desired effect, overall response will be higher in the treated group than in the control...

The Null Hypothesis

Occam's Razor is very important for data mining and statistics, although statistics expresses the idea a bit differently. The null hypothesis is the assumption that differences among observations are due simply to chance. To give an example, consider a presidential poll that gives Candidate A 45 percent and Candidate B 47 percent. Because this data is from a poll, there are several sources of error, so the values are only approximate estimates of the popularity of each candidate. The layperson...

Acknowledgments

We are fortunate to be surrounded by some of the most talented data miners anywhere, so our first thanks go to our colleagues at Data Miners, Inc. from whom we have learned so much Will Potts, Dorian Pyle, and Brij Masand. There are also clients with whom we work so closely that we consider them our colleagues as well Harrison Sohmer and Stuart E. Ward, III are in that category. Our Editor, Bob Elliott, Editorial Assistant, Erica Weinstein, and Development Editor, Emilie Herman, kept us more or...

Creating a Balanced Sample

Very often, the data mining task involves learning to distinguish between groups such as responders and nonresponders, goods and bads, or members of different customer segments. As explained in the sidebar, data mining algorithms do best when these groups have roughly the same number of members. This is unlikely to occur naturally. In fact, it is usually the more interesting groups that are underrepresented. Before modeling, the dataset should be made balanced either by sampling from the...

Comparing Results Using Difference of Proportions

Overlapping bounds is easy but its results are a bit pessimistic. That is, even though the confidence intervals overlap, we might still be quite confident that the difference is not due to chance with some given level of confidence. Another approach is to look at the difference between response rates, rather than the rates themselves. Just as there is a formula for the standard error of a proportion, there is a formula for the standard error of a difference of proportions SEDP This formula is a...

Data Mining to Improve Direct Marketing Campaigns

Advertising can be used to reach prospects about whom nothing is known as individuals. Direct marketing requires at least a tiny bit of additional information such as a name and address or a phone number or an email address. Where there is more information, there are also more opportunities for data mining. At the most basic level, data mining can be used to improve targeting by selecting which people to contact. Actually, the first level of targeting does not require data mining, only data. In...

Data Mining Methodology and Best Practices

The preceding chapter introduced the virtuous cycle of data mining as a business process. That discussion divided the data mining process into four stages 2. Transforming data into information Now it is time to start looking at data mining as a technical process. The high-level outline remains the same, but the emphasis shifts. Instead of identifying a business problem, we now turn our attention to translating business problems into data mining problems. The topic of transforming data into...