Basic Stats, Advanced Stats, and other stuff: 2013

Friday, February 15, 2013

FAMQ - Frequently Asked Modeling Questions

This blog is a list of Modeling Questions and links to other posts that have detailed answers to these questions.

During the Model Development and Review process a number of issues may arise that may best be addressed proactively. This document is intended to address some of the more common issues that tend to come to the forefront during the model review process. Many of these items may be subject to the personal opinions of the individual reviewer; however, they are all valid issues. A resolution process should be developed to resolve any conflicts.

In addition, these questions can serve as a reference for new modelers who may just have some basic questions on the modeling process. How to get started? How to plan a project?

FAMQ:

Why are First Steps important? – Defining the Problem

How do you approach first steps if you are a client?

What kind of problem do you have?

What is the modeling target (dependent variable(s))?

What are predictive (independent) variables

Can a predictive variable be used in implementation?

What are the time windows for both the target and predictive variables?

What is Performance Inference?

How do we construct the A/R table?

What is model “engineering”?

How does the modeler know the engineering works?

What is parceling?

Why does parceling work?

Why does parceling work?

Return to FAMQ

Parceling works because of the nature of logistic regression. Logistic regression essentially is modeling the linear relationship between the independent variables and the natural log of the odds of 1 to 0. If we look at the process of duplicating observations and giving them weights proportional to their “1-ness” or “0-ness” then the 1/0 odds become the Loss Ratio or % recovery or Profit/Cost ratio.

One advantage of modeling this continuous target using logistic regression instead of linear regression is that it simplifies the modeling assumptions on the distribution of the target ratio since the kind of ratios discussed here often quite skewed due to a high percentage of 0 or small values and a few very large values. This technique and also has added power to the “goodness” or “badness” for the true 0’s and 1’s that may be ignored in a linear regression of a ratio.

For example:

Loan Collections – In a charged off loan with no recoveries the recovery % is zero regardless of how much was owed. In the parceling technique, this “Bad” account is extra bad if the amount owed was high, but not nearly as bad if the amount owed was small.
Insurance Risk –When an insurance policy has a no claims it is Good, but it’s Loss Ratio is 0 regardless of how much premium is paid. In the parceling technique, that account is extra “Good” if the premium paid is large and not so good when the premium paid is small. Likewise, when a claim is paid that is large then that account is extra “Bad.”
Profitability – If an account in any sort of business has no revenue, then the revenue cost ratio is 0 regardless as to how much that account has cost the company. With parceling a zero revenue account is extra bad if its associated costs are high and not so bad if its costs are low.
Return to FAMQ

What is Parceling?

Return to FAMQ

Parceling is a technique often used in analytical performance inference to help account for a “fuzzy” or probabilistic 0/1 outcome for a given observation It is also used in model development as an alternative to linear regression for to convert a ratio variable to a 0/1 or binary target. For example:

Inference

The problem in inference is to determine how a specific “unknown” observation would have performed had it been in the known population.

Lending or credit
- How would a rejected applicant have performed had it been accepted?
- How would a “unbooked” (accepted but walked away) applicant have performed had they taken the loan.
Direct Mail
- Would a potential customer have responded had they been mailed an offer?
- If they had responded, would they have purchased something?
Fraud
- Would a credit application been identified as fraudulent had it been investigated?
- Would an insurance claim been identified as fraudulent had it been investigated?

In this situation, the analysis of the known population can be extrapolated (very carefully) into the unknown population to derive a probability of the target performance (for example 1=Good or 0=Bad). This probability is then used to divide an unknown observation into two separate observations, a Good observation and a Bad observation. The good observation is given a weight equivalent to the probability of a 1 and the Bad observation is given a weight equivalent to the probability of 0.

Ratios

This technique is only applicable in some very specific conditions. In general, if the target can have different degrees of “Badness” or “Goodness” then parceling can be used. For example:

Loan Collections – When a loan has been charged off, some or all of the money can be recovered. If none is recovered it is Bad, if all is recovered, it is Good but if a portion is recovered then it is partially good and partially bad. The ratio here is % recovered.
Insurance Risk –When an insurance policy has a no claims it is Good, when it has claims it is somewhat Bad, depending on how large the claim is. The ratio here is Loss Ratio, or Loss/(Premiums Paid).
Profitability – An account in any sort of business could be classified as Good or Bad depending on the revenue generated from the account compared to the costs associated with the account. Those accounts with no costs are Good, those accounts with no revenue are Bad. The ratio here is Revenue/Cost.

Parceling is used in these ratio examples in a similar way to the inference solution. Each partial observation is duplicated making a “Good” observation and a “Bad” observation. The Good observations are given weights proportional to their “Goodness” ($ recovered on the charged off loan, insurance premiums paid, revenue generated by the acount) and the Bad observations are given weights proportional to their “Badness” ($ owed on the charged off loan, losses on the insurance due to claim(s), costs associated with the account).

For inference situations, parceling allows a single observation with unknown performance to be split into a “good” observation with a weight proportional to the estimated probability that observation would have been good and a :bad” observation with a weight proportional to the estimated probability that observation would have been “bad.” These parceled values are then added to the known population to build a final model based on the full TTD population.
Return to FAMQ

Tuesday, January 8, 2013

How does the modeler know the engineering works?

Return to FAMQ

Without some form of confirmed performance, the modeler needs to rely on a SME (Subject Matter Expert) to judgmentally validate the inferred performance. The primary way to do this with acquisition models (underwriting decisions for credit) is to compare the Accept/Reject decision using the model to the historical A/R patterns. If the model decisions are an improvement (judgmentally) over the traditional A/R pattern then the inference passes this test. Of course, the inferred model should still perform well on the known accounts, but not necessarily better than some known metric.

Return to FAMQ

What is model “engineering”?

Return to FAMQ

Engineering takes to form of forcing the model to include variables (and patterns within variables) that the data does not support. With traditional statistical techniques (various forms of regression) this is very difficult without ignoring the data altogether and using a judgemental model; but, MB (Model Builder from FICO) there are several possibilities just for this use.

The modeler can force variables not only to be in the model but have a much larger effect than the data dictates while optimizing around these constraints. The modeler can assign weights (score values) to specific bins

Return to FAMQ

Given all this a priori knowledge, what can be done to reflect this in a model based only on the KGB data?

Return to FAMQ

As alluded to in the previous section, we can “engineer” the KGB model to, as a minimum, generate scores that account directionally for this a priori knowledge. This engineering will necessarily degrade the performance of the KGB model in the “known” space, but it will ensure that when the model is used in the “reject” space it will appropriately generate Accept/Reject (A/R) patterns that reflect our a priori knowledge.

Return to FAMQ

Why reject inference is better than a KGB( Known Good/Bad) model?

Return to FAMQ

Many modelers will argue that the best model is the one that fits the known data best, but in the case of acquisition models, this is often not the case. The graph below shows a model built on the data indicated by the blue diamond symbols for data points at 650, 700, 750, 800 and 850. This data indicates a situation where the model extrapolates directly and accurately into the unknown area (values below 600).

Unfortunately, the Red square symbols are more like the real world. In this case, the known applicants that scored in the 600 to 700 range were “cherry picked” (chosen using additional information not reflected in the model data or unable to be used in the model development). The result of this “cherry picking” is that the applicants in the lower range performed slightly better than the full population had there been no “cherry-picking” (blue diamonds). Also, the highest scores were biased by a bit of negative selection due to pricing (resulting in them performing slightly worse than expected).

If the model that is fit on the red squares is used, then the extrapolation into the unknown “reject” space it will overestimate the risk of those accounts by (over 30%) which can obviously result in accepting accounts that are riskier than indicated by the model and have a negative effect on the ability of the model to rank order effectively.

ALSO the model fit on the red squares does a much better job of fitting the known (red square) data but the model fit on the blue diamond data does a better job of fitting the BOTH the unkown data AND the known data.

Obviously, this is a made up (but realistic) example to prove a point. That point is that extrapolating into the reject space has a significant danger of overestimating the risk and reducing the rank order effectiveness in the risky areas. In most cases we do not have the information against which to validate the performance in the reject space; however, there is often substantial a priori historical information in the riskier space (see bullet points below) and that information can be used to help “engineer” the KGB (Known Good Bad) model so that the modified model reflects this a priori knowledge. Here is a summary of this a priori knowledge:

DQ – Delinquency, borrowers with a history of multiple DQ and more severe DQ are riskier than those with little or no DQ history.

Time – Borrowers with longer history of credit are less risky than those with short histories.
Breadth – People with a wide breadth of credit (multiple trade lines of different types) are less risky than those with only a few trade lines. (this has a little caveat as large numbers of trade lines often indicate potential over extension of credit).
Utilization – borrowers who over utilize their credit tend to be more risky.
Search for new credit – Borrowers who are actively doing a lot of credit searching (multiple inquiries, newly opened trade lines, …) tend to be riskier.

Return to FAMQ

How do we construct the A/R table?

Return to FAMQ

This is best demonstrated by an example.

For the historical pattern, let’s take a search for new credit variable, for example, # of Inquiries in last 24 months with 5 bins 0 Inq, 1 Inq, 2-3, 4-5, and 6 or more. The final table is shown below:

# IQ 24 M		History			Proposed
Bin	Counts	Accepts	Reject	Accept %	Accepts	Reject	Accept %
0	2,000	1,700	300	85%	1,800	200	90%
1	4,000	3,500	500	88%	3,500	500	88%
2-3	4,000	3,000	1,000	75%	3,200	800	80%
4-5	3,000	2,000	1,000	67%	1,900	1,100	63%
6+	2,000	1,000	1,000	50%	800	1,200	40%
Total	15,000	11,200	3,800	75%	11,200	3,800	75%

Here’s the explanation for this table. During the development time frame, we had 15,000 applicants, 11,200 were accepted and 3,800 were rejected for a 75% accept rate (it doesn’t matter how many were booked). 2,000 applicants had no inquiries and of those 300 were rejected. 4,000 had 1 inquiry and 500 of those were rejected, … . Basically, you need to a cross tab for each of the variables against the Accept/Reject flag. For the Proposed A/R, rank order the population by the new score and then assume that any account that is in the bottom 25% (historical reject rate) is a reject according to the proposed model and anything in the top 75% is an accept.

Return to FAMQ

What is the Accept/Reject (A/R) pattern?

Return to FAMQ

The A/R pattern is used in Reject inference (as opposed to all performance inference) to understand how any new decision process (usually a new acquisition model) compares to the historical process. The idea is to look at a number of salient dimensions, known to be important in predicting risk, and compare the new process to the old process on these dimensions.

For example, if the historical process accepted 80% of the applicants with no severe delinquencies and accepted only 20% of the applicants with one severe DQ then we would hope that the new process would accept more than 80% with no DQ and fewer than 20% with one or more.

The following examples may help.

# Trade Lines		History			Proposed
#	Count	Accept	Reject	Accept %	Accept	Reject	Accept %
0-2 TL	2,000	1,000	1,000	50%	500	1,500	25%
2-4 TL	4,000	2,500	1,500	63%	2,200	1,800	55%
5-8 TL	4,000	3,000	1,000	75%	3,600	700	90%
8+	3,000	2,500	500	83%	2,700	200	90%
Total	13,000	9,000	4,000	69%	9,000	4,000	69%

In the above example we can see that in the historical pattern there was higher acceptance rate with more trade lines (breadth of credit experience). In the proposed pattern, the difference is even more dramatic.

# 30 Day DQ		History			Proposed
#	Counts	Accept	Reject	Accept %	Accept	Reject	Accept %
0	7,000	6,000	1,000	86%	6,100	900	87%
1 DQ	4,000	2,700	1,300	68%	2,660	1,340	67%
2 DQ	1,000	200	800	20%	150	850	15%
3+ DQ	1,000	100	900	10%	90	910	9%
Total	13,000	9,000	4,000	69%	9,000	4,000	69%

In the above example, we see that historically, the more DQ’s the fewer accepts. In the proposed process this pattern is a bit more exaggerated.

This pattern comparison should be done for at least two variables in each of the major risk dimensions. (See “Error: Reference source not found”)

The A/R pattern is based on the full TTD population sampled during the development window, it includes the booked loans, the rejects, and those that were accepted (approved) but walked away (turned down the loan). The A/R history pattern looks at those that were accepted and booked AND those that were accepted and walked away vs. the rejects. This process does not look at the Out of Time sample or any performance on the known (booked or active) accounts.

Return to FAMQ

When is performance inference required?

Return to FAMQ

Performance inference is almost always required in credit origination models. The guidelines from the OCC (Office of the Comptroller of the Currency) indicate that these models consider the full TTD (Through The Door) population in model development.

In other modeling situations, inference may not be required, or even feasible; however, when possible and time permitting, it is best practice.

Return to FAMQ

How is performance inference done?

Return to FAMQ

There are a number of methods for performance inference:

Testing – One of the most reliable is to “test” into the unknown market by accepting applications that would be rejected or not normally included in the “known” sample and weighted up to represent the full TTD sample.
- Advantages
  - Control the number and type of accounts that are accepted.
  - Management of these accounts is the same as they would be if the company actually expanded into this part of the market.
  - Surprises – The results of testing may show that this group is not as risky as expected and may represent a niche market in which the company can excel.
- Disadvantages
  - Cost, since accounts are being accepted that are riskier than the price justifies. For example, a credit company might accept 10% of the normally rejected accounts at the same price structure as the normally accepted accounts and incur higher losses than the price justifies.
  - Time – It takes a year or two to get a good read on the actual performance in some situations.
Surrogate Accounts – Try to look at the unknown accounts and find a record of how those accounts may have behaved at a competitor over the same time frame or look at a generic data set and try to match up the unknown accounts based on other variables such as FICO, demographics, income, …
- Advantages
  - No uncertain costs – The account performance is with another company so there is no cost other than the fixed cost of buying the information
  - Time – Results are immediately available
- Disadvantages
  - The account performance was not under the same management conditions so the performance inference may be off.
  - Matching logic may be inexact and difficult to confirm.
Analytical Inference – This is a technique using expert knowledge of the domain in which the known model is developed and then “engineering” the model based on the known population to account for any model anomalies or inconsistencies in the inferred population. Analytical Performance Inference is a methodology that helps protect against the “truncation” phenomenon.
- Advantages
  - No additional data needed
  - Results are immediately available.
  - Current account management strategies are accounted for in the analysis
- Disadvantages
  - Accuracy is dependent on the quality of the analyst and the tools available.
  - It is a lengthy process requiring multiple iterations with the developer and an experienced reviewer. Even then the results are not guaranteed.

At the end of the performance inference stage, there are a number of items that should be reviewed, depending upon the inference technique being used. The primary goal here is to ensure that the inference results indicate that any “truncation” or other problems with the original data have been ameliorated. The results will include:

Specification
Validation
Known vs. Inferred Bad rates
Accept/Reject reports
Low-side over-rides
Score segment coverage
Unmatched rate
Bias in match
Sample adjustment process using inference
Special analysis depending on Inference Methodology used.

Return to FAMQ

Why is performance inference important?

Return to FAMQ

Performance inference helps prevent the effects of “truncation.” If there is a serious chance that “truncation” is occurring, then performance inference should be included in the analysis.

In some cases, performance inference is less important than others. The primary importance depends on how the models are going to be used and how likely the inferred population is to be included in the scoring process. Also, how important is it to understand, for business knowledge the inferred population. For example:

Credit Risk – If there is a sub group of applicants that will never be considered for credit (recent bankrupts, foreclosures, very low FICO scores …) then they might not need to be scored; however, if there is a reasonable chance that these credit policies might change then knowing how these policy rejects will perform is important for scoring purposes. It is very likely that the current rejects will perform worse than the “known” population.
Insurance Risk – There may be an insurance policy that rejects any applicants who have had 5 accidents in the last year. As in credit, this policy might change and if it does it would be important to understand the insurance risk of this group to properly price them for insurance
Competition – In both the risk examples above the are groups of applicants that walk away from an offer. This is often because they found a better offer elsewhere. In that case, it helps to understand the performance of these “walk aways” in order to make them offers that might be more competitive.
Marketing – If a particular set of products is marketed to a very specific group and there is interest in expanding this market then it would be of interest to better understand the expanded population without going to the expense of actually marketing to that population.

The primary purpose of performance inference is to understand something about populations that the business is currently not serving. In the case of risk there is a lot of potential loss in “testing” into that population. In marketing or being more competitive, the risk is limited to the expense of marketing into a new arena or lowering prices to meet the competition. In both cases, performance inference will help in understanding the population with less expense and time.

Return to FAMQ

Mathematical Definition	Example	Description
P = Probability of a 1	P = .9	Probability of a 1 (Good account) is 90%, 0 (Bad account) is 10%.
Odds = P/(1-P)	Odds = .9/.1 =9	9 Good accounts for every Bad account
Logit = K + ∑(b_i*x_i)	Logit = 0.2+.8x₁-5.5x₂	Two variable equation, x₁ is Number of Trade Lines and x₂ is number of delinquent accounts
P = e^{[K +}^∑(^bixi)]/ (1+ e^{[K +}^∑(^bixi)])	x₁ = 5, x₂ = 1 Logit = -1.3 P = e^-1.3/(1+ e^-1.3) = 0.214 Odds = 0.214/ (1-0.214) = 0.272	Account with 5 Trades, and 1 delinquent trade has a probability of .214 or 21% of these accounts are bad and the resulting odds of being Good .272 to 1 or 27 out of 100 are Good

Friday, February 15, 2013

Tuesday, January 8, 2013

Monday, January 7, 2013

Examples