Many modelers will argue that the best
model is the one that fits the known data best, but in the case of
acquisition models, this is often not the case. The graph below shows
a model built on the data indicated by the blue diamond symbols for
data points at 650, 700, 750, 800 and 850. This data indicates a
situation where the model extrapolates directly and accurately into
the unknown area (values below 600).
Unfortunately, the Red square symbols
are more like the real world. In this case, the known applicants that
scored in the 600 to 700 range were “cherry picked” (chosen using
additional information not reflected in the model data or unable to
be used in the model development). The result of this “cherry
picking” is that the applicants in the lower range performed
slightly better than the full population had there been no
“cherry-picking” (blue diamonds). Also, the highest scores were
biased by a bit of negative selection due to pricing (resulting in
them performing slightly worse than expected).
If the model that is fit on the red
squares is used, then the extrapolation into the unknown “reject”
space it will overestimate the risk of those accounts by (over 30%)
which can obviously result in accepting accounts that are riskier
than indicated by the model and have a negative effect on the ability
of the model to rank order effectively.
ALSO the model fit on the red squares
does a much better job of fitting the known (red square) data but the
model fit on the blue diamond data does a better job of fitting the
BOTH the unkown data AND the known data.
Obviously, this is a made up (but
realistic) example to prove a point. That point is that extrapolating
into the reject space has a significant danger of overestimating the
risk and reducing the rank order effectiveness in the risky areas. In
most cases we do not have the information against which to validate
the performance in the reject space; however, there is often
substantial a priori historical information in the riskier space (see
bullet points below) and that information can be used to help
“engineer” the KGB (Known Good Bad) model so that the modified
model reflects this a priori knowledge. Here is a summary of this a
priori knowledge:
- DQ – Delinquency, borrowers with a history of multiple DQ and more severe DQ are riskier than those with little or no DQ history.
- Time – Borrowers with longer history of credit are less risky than those with short histories.
- Breadth – People with a wide breadth of credit (multiple trade lines of different types) are less risky than those with only a few trade lines. (this has a little caveat as large numbers of trade lines often indicate potential over extension of credit).
- Utilization – borrowers who over utilize their credit tend to be more risky.
- Search for new credit – Borrowers who are actively doing a lot of credit searching (multiple inquiries, newly opened trade lines, …) tend to be riskier.
Return to FAMQ