Basic Stats, Advanced Stats, and other stuff: Why reject inference is better than a KGB( Known Good/Bad) model?

Return to FAMQ

Many modelers will argue that the best model is the one that fits the known data best, but in the case of acquisition models, this is often not the case. The graph below shows a model built on the data indicated by the blue diamond symbols for data points at 650, 700, 750, 800 and 850. This data indicates a situation where the model extrapolates directly and accurately into the unknown area (values below 600).

Unfortunately, the Red square symbols are more like the real world. In this case, the known applicants that scored in the 600 to 700 range were “cherry picked” (chosen using additional information not reflected in the model data or unable to be used in the model development). The result of this “cherry picking” is that the applicants in the lower range performed slightly better than the full population had there been no “cherry-picking” (blue diamonds). Also, the highest scores were biased by a bit of negative selection due to pricing (resulting in them performing slightly worse than expected).

If the model that is fit on the red squares is used, then the extrapolation into the unknown “reject” space it will overestimate the risk of those accounts by (over 30%) which can obviously result in accepting accounts that are riskier than indicated by the model and have a negative effect on the ability of the model to rank order effectively.

ALSO the model fit on the red squares does a much better job of fitting the known (red square) data but the model fit on the blue diamond data does a better job of fitting the BOTH the unkown data AND the known data.

Obviously, this is a made up (but realistic) example to prove a point. That point is that extrapolating into the reject space has a significant danger of overestimating the risk and reducing the rank order effectiveness in the risky areas. In most cases we do not have the information against which to validate the performance in the reject space; however, there is often substantial a priori historical information in the riskier space (see bullet points below) and that information can be used to help “engineer” the KGB (Known Good Bad) model so that the modified model reflects this a priori knowledge. Here is a summary of this a priori knowledge:

DQ – Delinquency, borrowers with a history of multiple DQ and more severe DQ are riskier than those with little or no DQ history.

Time – Borrowers with longer history of credit are less risky than those with short histories.
Breadth – People with a wide breadth of credit (multiple trade lines of different types) are less risky than those with only a few trade lines. (this has a little caveat as large numbers of trade lines often indicate potential over extension of credit).
Utilization – borrowers who over utilize their credit tend to be more risky.
Search for new credit – Borrowers who are actively doing a lot of credit searching (multiple inquiries, newly opened trade lines, …) tend to be riskier.

Return to FAMQ

Basic Stats, Advanced Stats, and other stuff

Tuesday, January 8, 2013

Why reject inference is better than a KGB( Known Good/Bad) model?

No comments:

Post a Comment