Basic Stats, Advanced Stats, and other stuff: January 2013

Tuesday, January 8, 2013

How does the modeler know the engineering works?

Return to FAMQ

Without some form of confirmed performance, the modeler needs to rely on a SME (Subject Matter Expert) to judgmentally validate the inferred performance. The primary way to do this with acquisition models (underwriting decisions for credit) is to compare the Accept/Reject decision using the model to the historical A/R patterns. If the model decisions are an improvement (judgmentally) over the traditional A/R pattern then the inference passes this test. Of course, the inferred model should still perform well on the known accounts, but not necessarily better than some known metric.

Return to FAMQ

What is model “engineering”?

Return to FAMQ

Engineering takes to form of forcing the model to include variables (and patterns within variables) that the data does not support. With traditional statistical techniques (various forms of regression) this is very difficult without ignoring the data altogether and using a judgemental model; but, MB (Model Builder from FICO) there are several possibilities just for this use.

The modeler can force variables not only to be in the model but have a much larger effect than the data dictates while optimizing around these constraints. The modeler can assign weights (score values) to specific bins

Return to FAMQ

Given all this a priori knowledge, what can be done to reflect this in a model based only on the KGB data?

Return to FAMQ

As alluded to in the previous section, we can “engineer” the KGB model to, as a minimum, generate scores that account directionally for this a priori knowledge. This engineering will necessarily degrade the performance of the KGB model in the “known” space, but it will ensure that when the model is used in the “reject” space it will appropriately generate Accept/Reject (A/R) patterns that reflect our a priori knowledge.

Return to FAMQ

Why reject inference is better than a KGB( Known Good/Bad) model?

Return to FAMQ

Many modelers will argue that the best model is the one that fits the known data best, but in the case of acquisition models, this is often not the case. The graph below shows a model built on the data indicated by the blue diamond symbols for data points at 650, 700, 750, 800 and 850. This data indicates a situation where the model extrapolates directly and accurately into the unknown area (values below 600).

Unfortunately, the Red square symbols are more like the real world. In this case, the known applicants that scored in the 600 to 700 range were “cherry picked” (chosen using additional information not reflected in the model data or unable to be used in the model development). The result of this “cherry picking” is that the applicants in the lower range performed slightly better than the full population had there been no “cherry-picking” (blue diamonds). Also, the highest scores were biased by a bit of negative selection due to pricing (resulting in them performing slightly worse than expected).

If the model that is fit on the red squares is used, then the extrapolation into the unknown “reject” space it will overestimate the risk of those accounts by (over 30%) which can obviously result in accepting accounts that are riskier than indicated by the model and have a negative effect on the ability of the model to rank order effectively.

ALSO the model fit on the red squares does a much better job of fitting the known (red square) data but the model fit on the blue diamond data does a better job of fitting the BOTH the unkown data AND the known data.

Obviously, this is a made up (but realistic) example to prove a point. That point is that extrapolating into the reject space has a significant danger of overestimating the risk and reducing the rank order effectiveness in the risky areas. In most cases we do not have the information against which to validate the performance in the reject space; however, there is often substantial a priori historical information in the riskier space (see bullet points below) and that information can be used to help “engineer” the KGB (Known Good Bad) model so that the modified model reflects this a priori knowledge. Here is a summary of this a priori knowledge:

DQ – Delinquency, borrowers with a history of multiple DQ and more severe DQ are riskier than those with little or no DQ history.

Time – Borrowers with longer history of credit are less risky than those with short histories.
Breadth – People with a wide breadth of credit (multiple trade lines of different types) are less risky than those with only a few trade lines. (this has a little caveat as large numbers of trade lines often indicate potential over extension of credit).
Utilization – borrowers who over utilize their credit tend to be more risky.
Search for new credit – Borrowers who are actively doing a lot of credit searching (multiple inquiries, newly opened trade lines, …) tend to be riskier.

Return to FAMQ

How do we construct the A/R table?

Return to FAMQ

This is best demonstrated by an example.

For the historical pattern, let’s take a search for new credit variable, for example, # of Inquiries in last 24 months with 5 bins 0 Inq, 1 Inq, 2-3, 4-5, and 6 or more. The final table is shown below:

# IQ 24 M		History			Proposed
Bin	Counts	Accepts	Reject	Accept %	Accepts	Reject	Accept %
0	2,000	1,700	300	85%	1,800	200	90%
1	4,000	3,500	500	88%	3,500	500	88%
2-3	4,000	3,000	1,000	75%	3,200	800	80%
4-5	3,000	2,000	1,000	67%	1,900	1,100	63%
6+	2,000	1,000	1,000	50%	800	1,200	40%
Total	15,000	11,200	3,800	75%	11,200	3,800	75%

Here’s the explanation for this table. During the development time frame, we had 15,000 applicants, 11,200 were accepted and 3,800 were rejected for a 75% accept rate (it doesn’t matter how many were booked). 2,000 applicants had no inquiries and of those 300 were rejected. 4,000 had 1 inquiry and 500 of those were rejected, … . Basically, you need to a cross tab for each of the variables against the Accept/Reject flag. For the Proposed A/R, rank order the population by the new score and then assume that any account that is in the bottom 25% (historical reject rate) is a reject according to the proposed model and anything in the top 75% is an accept.

Return to FAMQ

What is the Accept/Reject (A/R) pattern?

Return to FAMQ

The A/R pattern is used in Reject inference (as opposed to all performance inference) to understand how any new decision process (usually a new acquisition model) compares to the historical process. The idea is to look at a number of salient dimensions, known to be important in predicting risk, and compare the new process to the old process on these dimensions.

For example, if the historical process accepted 80% of the applicants with no severe delinquencies and accepted only 20% of the applicants with one severe DQ then we would hope that the new process would accept more than 80% with no DQ and fewer than 20% with one or more.

The following examples may help.

# Trade Lines		History			Proposed
#	Count	Accept	Reject	Accept %	Accept	Reject	Accept %
0-2 TL	2,000	1,000	1,000	50%	500	1,500	25%
2-4 TL	4,000	2,500	1,500	63%	2,200	1,800	55%
5-8 TL	4,000	3,000	1,000	75%	3,600	700	90%
8+	3,000	2,500	500	83%	2,700	200	90%
Total	13,000	9,000	4,000	69%	9,000	4,000	69%

In the above example we can see that in the historical pattern there was higher acceptance rate with more trade lines (breadth of credit experience). In the proposed pattern, the difference is even more dramatic.

# 30 Day DQ		History			Proposed
#	Counts	Accept	Reject	Accept %	Accept	Reject	Accept %
0	7,000	6,000	1,000	86%	6,100	900	87%
1 DQ	4,000	2,700	1,300	68%	2,660	1,340	67%
2 DQ	1,000	200	800	20%	150	850	15%
3+ DQ	1,000	100	900	10%	90	910	9%
Total	13,000	9,000	4,000	69%	9,000	4,000	69%

In the above example, we see that historically, the more DQ’s the fewer accepts. In the proposed process this pattern is a bit more exaggerated.

This pattern comparison should be done for at least two variables in each of the major risk dimensions. (See “Error: Reference source not found”)

The A/R pattern is based on the full TTD population sampled during the development window, it includes the booked loans, the rejects, and those that were accepted (approved) but walked away (turned down the loan). The A/R history pattern looks at those that were accepted and booked AND those that were accepted and walked away vs. the rejects. This process does not look at the Out of Time sample or any performance on the known (booked or active) accounts.

Return to FAMQ

When is performance inference required?

Return to FAMQ

Performance inference is almost always required in credit origination models. The guidelines from the OCC (Office of the Comptroller of the Currency) indicate that these models consider the full TTD (Through The Door) population in model development.

In other modeling situations, inference may not be required, or even feasible; however, when possible and time permitting, it is best practice.

Return to FAMQ

How is performance inference done?

Return to FAMQ

There are a number of methods for performance inference:

Testing – One of the most reliable is to “test” into the unknown market by accepting applications that would be rejected or not normally included in the “known” sample and weighted up to represent the full TTD sample.
- Advantages
  - Control the number and type of accounts that are accepted.
  - Management of these accounts is the same as they would be if the company actually expanded into this part of the market.
  - Surprises – The results of testing may show that this group is not as risky as expected and may represent a niche market in which the company can excel.
- Disadvantages
  - Cost, since accounts are being accepted that are riskier than the price justifies. For example, a credit company might accept 10% of the normally rejected accounts at the same price structure as the normally accepted accounts and incur higher losses than the price justifies.
  - Time – It takes a year or two to get a good read on the actual performance in some situations.
Surrogate Accounts – Try to look at the unknown accounts and find a record of how those accounts may have behaved at a competitor over the same time frame or look at a generic data set and try to match up the unknown accounts based on other variables such as FICO, demographics, income, …
- Advantages
  - No uncertain costs – The account performance is with another company so there is no cost other than the fixed cost of buying the information
  - Time – Results are immediately available
- Disadvantages
  - The account performance was not under the same management conditions so the performance inference may be off.
  - Matching logic may be inexact and difficult to confirm.
Analytical Inference – This is a technique using expert knowledge of the domain in which the known model is developed and then “engineering” the model based on the known population to account for any model anomalies or inconsistencies in the inferred population. Analytical Performance Inference is a methodology that helps protect against the “truncation” phenomenon.
- Advantages
  - No additional data needed
  - Results are immediately available.
  - Current account management strategies are accounted for in the analysis
- Disadvantages
  - Accuracy is dependent on the quality of the analyst and the tools available.
  - It is a lengthy process requiring multiple iterations with the developer and an experienced reviewer. Even then the results are not guaranteed.

At the end of the performance inference stage, there are a number of items that should be reviewed, depending upon the inference technique being used. The primary goal here is to ensure that the inference results indicate that any “truncation” or other problems with the original data have been ameliorated. The results will include:

Specification
Validation
Known vs. Inferred Bad rates
Accept/Reject reports
Low-side over-rides
Score segment coverage
Unmatched rate
Bias in match
Sample adjustment process using inference
Special analysis depending on Inference Methodology used.

Return to FAMQ

Why is performance inference important?

Return to FAMQ

Performance inference helps prevent the effects of “truncation.” If there is a serious chance that “truncation” is occurring, then performance inference should be included in the analysis.

In some cases, performance inference is less important than others. The primary importance depends on how the models are going to be used and how likely the inferred population is to be included in the scoring process. Also, how important is it to understand, for business knowledge the inferred population. For example:

Credit Risk – If there is a sub group of applicants that will never be considered for credit (recent bankrupts, foreclosures, very low FICO scores …) then they might not need to be scored; however, if there is a reasonable chance that these credit policies might change then knowing how these policy rejects will perform is important for scoring purposes. It is very likely that the current rejects will perform worse than the “known” population.
Insurance Risk – There may be an insurance policy that rejects any applicants who have had 5 accidents in the last year. As in credit, this policy might change and if it does it would be important to understand the insurance risk of this group to properly price them for insurance
Competition – In both the risk examples above the are groups of applicants that walk away from an offer. This is often because they found a better offer elsewhere. In that case, it helps to understand the performance of these “walk aways” in order to make them offers that might be more competitive.
Marketing – If a particular set of products is marketed to a very specific group and there is interest in expanding this market then it would be of interest to better understand the expanded population without going to the expense of actually marketing to that population.

The primary purpose of performance inference is to understand something about populations that the business is currently not serving. In the case of risk there is a lot of potential loss in “testing” into that population. In marketing or being more competitive, the risk is limited to the expense of marketing into a new arena or lowering prices to meet the competition. In both cases, performance inference will help in understanding the population with less expense and time.

Return to FAMQ

Mathematical Definition	Example	Description
P = Probability of a 1	P = .9	Probability of a 1 (Good account) is 90%, 0 (Bad account) is 10%.
Odds = P/(1-P)	Odds = .9/.1 =9	9 Good accounts for every Bad account
Logit = K + ∑(b_i*x_i)	Logit = 0.2+.8x₁-5.5x₂	Two variable equation, x₁ is Number of Trade Lines and x₂ is number of delinquent accounts
P = e^{[K +}^∑(^bixi)]/ (1+ e^{[K +}^∑(^bixi)])	x₁ = 5, x₂ = 1 Logit = -1.3 P = e^-1.3/(1+ e^-1.3) = 0.214 Odds = 0.214/ (1-0.214) = 0.272	Account with 5 Trades, and 1 delinquent trade has a probability of .214 or 21% of these accounts are bad and the resulting odds of being Good .272 to 1 or 27 out of 100 are Good

Tuesday, January 8, 2013

Monday, January 7, 2013

Examples