Basic Stats, Advanced Stats, and other stuff: How to specify a Dichotomous or Binary Target?

Return to FAMQ

Binary targets take on two values, usually specified as a “0” or “1.” The exact definition of “0” and “1” is not crucial to defining a binary target. In many situations, there is a “performance” time frame involved before a target can be accurately defined in the appropriate category.

As listed above there are a number of cases where a binary target is appropriate. In most of these cases it is also best practice to define an “Indeterminate” set (10-15% of sample population). See below for an explanation of this set. In the following examples, are some of the considerations when defining the target:

Good/Bad – Good and Bad are usually defined over a specific time frame. This time frame (see performance window below) has to be long enough to identify Bads and give the Goods time to be profitable.
- Good is usually defined as paid on time as promised over some minimum time frame and is usually easy to define. Common definitions are:
  - “Never delinquent,”
  - “Never more than 30 Days Past Due (DPD)”, which allows for people who are sometimes late (AKA “sloppy payers”) to be defined as good.
- Bad definition is more difficult and is usually defined with consultation with the business customer for whom the model is being developed. A common question to ask the customer is “If you knew at the time of application that a prospect would reach some level of delinquency (30, 60, 90 DPD) over the performance window would you have accepted that application?” Indeterminate sets are usually defined as “ever 30 DPD” or “multiple times 30 DPD but never worse.” Of course the Good, Indeterminate, and Bad definitions are mutually exclusive and collectively exhaustive; therefore, they are not independent.
Response/Non-response – Did they respond to the mailing or other solicitation? This is an easy target to define. For direct mail there needs to be some time frame over which responses are accepted. A well defined indeterminate set is difficult in this environment as there really is no in-between. A possible definition is a response after some time has elapsed, say 30 days, is indeterminate as it is unclear if the response is due to the specific solicitation.
Purchase/Not Purchase – Did they buy after responding? Again, how much time until they become a non-purchaser? This is usually and easy definition. The indeterminate set may include those people that bought something, but not enough to be profitable.
Attrition/Retention – This has many similarities to the Good/Bad definition. The time frame here is important. As with most of these definitions, they must be made with the modeling customer. The following considerations are pertinent:
- How long does it take before a customer can be considered retained?
- How long before the customer is profitable?
- What if the customer becomes inactive but does not close the account?
Fraud/Non-Fraud – Is a particular transaction legitimate or fraudulent. This may often be difficult to determine.
- In credit card transactions usually the true customer will identify fraud on a stolen or lost card or account take over. Here, the challenge is identifying the beginning of the string of fraudulent transactions.
- In insurance claims, a true fraud may never be identified if actual criminal prosecution is not pursued.
- In identity theft it may take years before the victim identifies the fraud.
- Fraud is usually a rare event and having a large enough sample is often a significant challenge in building a development sample.
- Hierarchical modeling can sometimes be used to identify highly probable non-fraud transactions and suspect transactions that can be used for indeterminate sets.
Above or Below a threshold – Sometimes, an exact prediction on a continuous target is not needed, just whether or not a customer will be above or below some threshold on a continuous target, such as revenue, recovery # transactions… The indeterminate set here is usually easy to define in that it is a small cushion around the threshold. For example, if customer profitability is the target and profitability threshold is $500 revenue over 5 years then:
- Good = $500+ revenue
- Indeterminate = $400 to $500
- Bad = Less than $400 revenue.
  
  Return to FAMQ

Basic Stats, Advanced Stats, and other stuff

Tuesday, January 8, 2013

How to specify a Dichotomous or Binary Target?

No comments:

Post a Comment