Tuesday, January 8, 2013

Why reject inference is better than a KGB( Known Good/Bad) model?

                                                        Return to FAMQ

Many modelers will argue that the best model is the one that fits the known data best, but in the case of acquisition models, this is often not the case. The graph below shows a model built on the data indicated by the blue diamond symbols for data points at 650, 700, 750, 800 and 850. This data indicates a situation where the model extrapolates directly and accurately into the unknown area (values below 600).


Unfortunately, the Red square symbols are more like the real world. In this case, the known applicants that scored in the 600 to 700 range were “cherry picked” (chosen using additional information not reflected in the model data or unable to be used in the model development). The result of this “cherry picking” is that the applicants in the lower range performed slightly better than the full population had there been no “cherry-picking” (blue diamonds). Also, the highest scores were biased by a bit of negative selection due to pricing (resulting in them performing slightly worse than expected).

If the model that is fit on the red squares is used, then the extrapolation into the unknown “reject” space it will overestimate the risk of those accounts by (over 30%) which can obviously result in accepting accounts that are riskier than indicated by the model and have a negative effect on the ability of the model to rank order effectively.

ALSO the model fit on the red squares does a much better job of fitting the known (red square) data but the model fit on the blue diamond data does a better job of fitting the BOTH the unkown data AND the known data.

Obviously, this is a made up (but realistic) example to prove a point. That point is that extrapolating into the reject space has a significant danger of overestimating the risk and reducing the rank order effectiveness in the risky areas. In most cases we do not have the information against which to validate the performance in the reject space; however, there is often substantial a priori historical information in the riskier space (see bullet points below) and that information can be used to help “engineer” the KGB (Known Good Bad) model so that the modified model reflects this a priori knowledge. Here is a summary of this a priori knowledge:
  • DQ – Delinquency, borrowers with a history of multiple DQ and more severe DQ are riskier than those with little or no DQ history.
  • Time – Borrowers with longer history of credit are less risky than those with short histories.
  • Breadth – People with a wide breadth of credit (multiple trade lines of different types) are less risky than those with only a few trade lines. (this has a little caveat as large numbers of trade lines often indicate potential over extension of credit).
  • Utilization – borrowers who over utilize their credit tend to be more risky.
  • Search for new credit – Borrowers who are actively doing a lot of credit searching (multiple inquiries, newly opened trade lines, …) tend to be riskier.

                                                            Return to FAMQ

How do we construct the A/R table?

                                                        Return to FAMQ

This is best demonstrated by an example.

For the historical pattern, let’s take a search for new credit variable, for example, # of Inquiries in last 24 months with 5 bins 0 Inq, 1 Inq, 2-3, 4-5, and 6 or more. The final table is shown below:

# IQ 24 M History Proposed
Bin
Counts
Accepts
Reject
Accept %
Accepts
Reject
Accept %
0 2,000 1,700 300 85% 1,800 200 90%
1 4,000 3,500 500 88% 3,500 500 88%
2-3 4,000 3,000 1,000 75% 3,200 800 80%
4-5 3,000 2,000 1,000 67% 1,900 1,100 63%
6+ 2,000 1,000 1,000 50% 800 1,200 40%
Total 15,000 11,200 3,800 75% 11,200 3,800 75%

Here’s the explanation for this table. During the development time frame, we had 15,000 applicants, 11,200 were accepted and 3,800 were rejected for a 75% accept rate (it doesn’t matter how many were booked). 2,000 applicants had no inquiries and of those 300 were rejected. 4,000 had 1 inquiry and 500 of those were rejected, … . Basically, you need to a cross tab for each of the variables against the Accept/Reject flag. For the Proposed A/R, rank order the population by the new score and then assume that any account that is in the bottom 25% (historical reject rate) is a reject according to the proposed model and anything in the top 75% is an accept.

                                                        Return to FAMQ

What is the Accept/Reject (A/R) pattern?

                                                        Return to FAMQ

The A/R pattern is used in Reject inference (as opposed to all performance inference) to understand how any new decision process (usually a new acquisition model) compares to the historical process. The idea is to look at a number of salient dimensions, known to be important in predicting risk, and compare the new process to the old process on these dimensions.

For example, if the historical process accepted 80% of the applicants with no severe delinquencies and accepted only 20% of the applicants with one severe DQ then we would hope that the new process would accept more than 80% with no DQ and fewer than 20% with one or more.

The following examples may help.
# Trade Lines
History
Proposed
#
Count
Accept
Reject
Accept %
Accept
Reject
Accept %
0-2 TL
2,000
1,000
1,000
50%
500
1,500
25%
2-4 TL
4,000
2,500
1,500
63%
2,200
1,800
55%
5-8 TL
4,000
3,000
1,000
75%
3,600
700
90%
8+
3,000
2,500
500
83%
2,700
200
90%
Total
13,000
9,000
4,000
69%
9,000
4,000
69%

In the above example we can see that in the historical pattern there was higher acceptance rate with more trade lines (breadth of credit experience). In the proposed pattern, the difference is even more dramatic.
# 30 Day DQ
History
Proposed
#
Counts
Accept
Reject
Accept %
Accept
Reject
Accept %
0 7,000 6,000 1,000
86%
6,100 900
87%
1 DQ 4,000 2,700 1,300
68%
2,660 1,340
67%
2 DQ 1,000 200 800
20%
150 850
15%
3+ DQ 1,000 100 900
10%
90 910
9%
Total 13,000 9,000 4,000
69%
9,000 4,000
69%

In the above example, we see that historically, the more DQ’s the fewer accepts. In the proposed process this pattern is a bit more exaggerated.

This pattern comparison should be done for at least two variables in each of the major risk dimensions. (See “Error: Reference source not found”)

The A/R pattern is based on the full TTD population sampled during the development window, it includes the booked loans, the rejects, and those that were accepted (approved) but walked away (turned down the loan). The A/R history pattern looks at those that were accepted and booked AND those that were accepted and walked away vs. the rejects. This process does not look at the Out of Time sample or any performance on the known (booked or active) accounts.

                                                        Return to FAMQ

When is performance inference required?

                                                        Return to FAMQ

Performance inference is almost always required in credit origination models. The guidelines from the OCC (Office of the Comptroller of the Currency) indicate that these models consider the full TTD (Through The Door) population in model development.

In other modeling situations, inference may not be required, or even feasible; however, when possible and time permitting, it is best practice.

                                                        Return to FAMQ

How is performance inference done?

                                                        Return to FAMQ

There are a number of methods for performance inference:
  • Testing – One of the most reliable is to “test” into the unknown market by accepting applications that would be rejected or not normally included in the “known” sample and weighted up to represent the full TTD sample.
    • Advantages
      • Control the number and type of accounts that are accepted.
      • Management of these accounts is the same as they would be if the company actually expanded into this part of the market.
      • Surprises – The results of testing may show that this group is not as risky as expected and may represent a niche market in which the company can excel.
    • Disadvantages
      • Cost, since accounts are being accepted that are riskier than the price justifies. For example, a credit company might accept 10% of the normally rejected accounts at the same price structure as the normally accepted accounts and incur higher losses than the price justifies.
      • Time – It takes a year or two to get a good read on the actual performance in some situations.
  • Surrogate Accounts – Try to look at the unknown accounts and find a record of how those accounts may have behaved at a competitor over the same time frame or look at a generic data set and try to match up the unknown accounts based on other variables such as FICO, demographics, income, …
    • Advantages
      • No uncertain costs – The account performance is with another company so there is no cost other than the fixed cost of buying the information
      • Time – Results are immediately available
    • Disadvantages
      • The account performance was not under the same management conditions so the performance inference may be off.
      • Matching logic may be inexact and difficult to confirm.
  • Analytical Inference – This is a technique using expert knowledge of the domain in which the known model is developed and then “engineering” the model based on the known population to account for any model anomalies or inconsistencies in the inferred population. Analytical Performance Inference is a methodology that helps protect against the “truncation” phenomenon.
    • Advantages
      • No additional data needed
      • Results are immediately available.
      • Current account management strategies are accounted for in the analysis
    • Disadvantages
      • Accuracy is dependent on the quality of the analyst and the tools available.
      • It is a lengthy process requiring multiple iterations with the developer and an experienced reviewer. Even then the results are not guaranteed.
At the end of the performance inference stage, there are a number of items that should be reviewed, depending upon the inference technique being used. The primary goal here is to ensure that the inference results indicate that any “truncation” or other problems with the original data have been ameliorated. The results will include:
  • Specification
  • Validation
  • Known vs. Inferred Bad rates
  • Accept/Reject reports
  • Low-side over-rides
  • Score segment coverage
  • Unmatched rate
  • Bias in match
  • Sample adjustment process using inference
  • Special analysis depending on Inference Methodology used.

                                                            Return to FAMQ

Why is performance inference important?

                                                        Return to FAMQ

Performance inference helps prevent the effects of “truncation.” If there is a serious chance that “truncation” is occurring, then performance inference should be included in the analysis.

In some cases, performance inference is less important than others. The primary importance depends on how the models are going to be used and how likely the inferred population is to be included in the scoring process. Also, how important is it to understand, for business knowledge the inferred population. For example:
  • Credit Risk – If there is a sub group of applicants that will never be considered for credit (recent bankrupts, foreclosures, very low FICO scores …) then they might not need to be scored; however, if there is a reasonable chance that these credit policies might change then knowing how these policy rejects will perform is important for scoring purposes. It is very likely that the current rejects will perform worse than the “known” population.
  • Insurance Risk – There may be an insurance policy that rejects any applicants who have had 5 accidents in the last year. As in credit, this policy might change and if it does it would be important to understand the insurance risk of this group to properly price them for insurance
  • Competition – In both the risk examples above the are groups of applicants that walk away from an offer. This is often because they found a better offer elsewhere. In that case, it helps to understand the performance of these “walk aways” in order to make them offers that might be more competitive.
  • Marketing – If a particular set of products is marketed to a very specific group and there is interest in expanding this market then it would be of interest to better understand the expanded population without going to the expense of actually marketing to that population.
The primary purpose of performance inference is to understand something about populations that the business is currently not serving. In the case of risk there is a lot of potential loss in “testing” into that population. In marketing or being more competitive, the risk is limited to the expense of marketing into a new arena or lowering prices to meet the competition. In both cases, performance inference will help in understanding the population with less expense and time.

                                                        Return to FAMQ

What is Performance Inference?


In many modeling situations there is often a sub group of the target population on which the performance or target is not observed. In most cases, it is important that we understand how these subgroups would have performed had we been able to observe their performance. This process is called performance inference For example:
  • There is a subgroup of Credit applicants who are either turned down (rejected) or walk away (accepted but refuse the offer). What would their credit performance have been had they taken the credit?
  • Prospects who are mailed sales offer, but don’t respond. What would they have bought had they responded?
  • Insurance applicants who are offered an insurance policy but walk away from the offer. What losses would they have incurred had they taken the offer?
These are all examples where performance inference could be important. Essentially, performance inference is a “What if?” exercise. It is typically used to understand the full population under consideration in the business situation of concern:
  • In credit or insurance – The full population is anyone who might apply for a loan or insurance. This is known as the Through The Door (TTD) population.
  • In marketing – The full population is the people for whom the products are intended. This could be limited by:
    • Geography (just people who live in Chicago, IL),
    • Interest (just people who attend the Opera)
    • Behavior (people who fly internationally)