Basic Stats, Advanced Stats, and other stuff: Why is an indeterminate set important with a binary target?

Return to FAMQ

The main objective of using a binary target in modeling is to understand what distinguishes the two groups. In many cases there is a clear separation between the two groups, but in other cases there may be clear definitions for part of the population, but there can also be borderline cases where the differences are not so clear. An indeterminate set is used when a dichotomous target has a “fuzzy” definitional area. This gives a cleaner separation between the two groups and leads to a better model.

This is best explained by a few examples:

Good/Bad Credit targets – In these, very common, situations the definition of a Good us usually straightforward. An account that has never gone delinquent. The Bad definition, in an absolute sense, is whether the account charged off (C/O) or went into default or bankruptcy; however, that definition is often too harsh for several reasons:
- For a small portfolio or one with few charge off accounts there may not be enough bads to build a model.
- The cost of collecting on severely delinquent accounts (90+ DPD) is extensive and even if the account cures there is a high probability that it will C/O in the future.
- Most risk managers will agree that there is some level of severe delinquency before C/O that they would prefer to avoid.

Whatever the Bad definition, it is clear that there is a gap between the Good and Bad definitions that, at a very minimum, helps clarify the definitions. If the Bad = C/O definition is used then how are severe delinquencies defined. They look a lot more like Bad accounts than good accounts. Typically, only accounts that have been once 30DPD might be considered as good. Usually, any account that is “multiple 30DPD+” or worse, but not in the bad definition, is put in the indeterminate set.

High/Lo targets – When the target is defined as above or below some threshold then the indeterminate set is easier to define. Philosophically, those observations that are way above the threshold are clearly in the High group and those that are way below are in the low group, but what about those that are near the threshold. Those are the cases that naturally belong in the indeterminate set. The size of this buffer can be defined by the modeling customer, or without guidance use the 10-15% rule.
Attrition Targets – One of the typical objectives of attrition models is to identify those accounts that are likely to cancel or let their membership lapse. This score can be used in to take proactive measures to retain these customers. Sometimes there will be unprofitable customers that the business might not be interested in retaining. The indeterminate set could be defined to include these customers. So the definition might be unprofitable customers that either attrite or stay. This definition usually can include inactive customers.

Return to FAMQ

Basic Stats, Advanced Stats, and other stuff

Tuesday, January 8, 2013

Why is an indeterminate set important with a binary target?

No comments:

Post a Comment