Tuesday, January 8, 2013

How does the modeler know the engineering works?

                                                        Return to FAMQ

Without some form of confirmed performance, the modeler needs to rely on a SME (Subject Matter Expert) to judgmentally validate the inferred performance. The primary way to do this with acquisition models (underwriting decisions for credit) is to compare the Accept/Reject decision using the model to the historical A/R patterns. If the model decisions are an improvement (judgmentally) over the traditional A/R pattern then the inference passes this test. Of course, the inferred model should still perform well on the known accounts, but not necessarily better than some known metric.

                                                        Return to FAMQ

What is model “engineering”?

                                                        Return to FAMQ

Engineering takes to form of forcing the model to include variables (and patterns within variables) that the data does not support. With traditional statistical techniques (various forms of regression) this is very difficult without ignoring the data altogether and using a judgemental model; but, MB (Model Builder from FICO) there are several possibilities just for this use.

The modeler can force variables not only to be in the model but have a much larger effect than the data dictates while optimizing around these constraints. The modeler can assign weights (score values) to specific bins

                                                        Return to FAMQ

Given all this a priori knowledge, what can be done to reflect this in a model based only on the KGB data?

                                                        Return to FAMQ

As alluded to in the previous section, we can “engineer” the KGB model to, as a minimum, to generate scores that account directionally for this a priori knowledge. This engineering will necessarily degrade the performance of the KGB model in the “known” space, but it will ensure that when the model is used in the “reject” space it will appropriately generate Accept/Reject (A/R) patterns that reflect our a priori knowledge.

                                                        Return to FAMQ

Why reject inference is better than a KGB( Known Good/Bad) model?

                                                        Return to FAMQ

Many modelers will argue that the best model is the one that fits the known data best, but in the case of acquisition models, this is often not the case. The graph below shows a model built on the data indicated by the blue diamond symbols for data points at 650, 700, 750, 800 and 850. This data indicates a situation where the model extrapolates directly and accurately into the unknown area (values below 600).


Unfortunately, the Red square symbols are more like the real world. In this case, the known applicants that scored in the 600 to 700 range were “cherry picked” (chosen using additional information not reflected in the model data or unable to be used in the model development). The result of this “cherry picking” is that the applicants in the lower range performed slightly better than the full population had there been no “cherry-picking” (blue diamonds). Also, the highest scores were biased by a bit of negative selection due to pricing (resulting in them performing slightly worse than expected).

If the model that is fit on the red squares is used, then the extrapolation into the unknown “reject” space it will overestimate the risk of those accounts by (over 30%) which can obviously result in accepting accounts that are riskier than indicated by the model and have a negative effect on the ability of the model to rank order effectively.

ALSO the model fit on the red squares does a much better job of fitting the known (red square) data but the model fit on the blue diamond data does a better job of fitting the BOTH the unkown data AND the known data.

Obviously, this is a made up (but realistic) example to prove a point. That point is that extrapolating into the reject space has a significant danger of overestimating the risk and reducing the rank order effectiveness in the risky areas. In most cases we do not have the information against which to validate the performance in the reject space; however, there is often substantial a priori historical information in the riskier space (see bullet points below) and that information can be used to help “engineer” the KGB (Known Good Bad) model so that the modified model reflects this a priori knowledge. Here is a summary of this a priori knowledge:
  • DQ – Delinquency, borrowers with a history of multiple DQ and more severe DQ are riskier than those with little or no DQ history.
  • Time – Borrowers with longer history of credit are less risky than those with short histories.
  • Breadth – People with a wide breadth of credit (multiple trade lines of different types) are less risky than those with only a few trade lines. (this has a little caveat as large numbers of trade lines often indicate potential over extension of credit).
  • Utilization – borrowers who over utilize their credit tend to be more risky.
  • Search for new credit – Borrowers who are actively doing a lot of credit searching (multiple inquiries, newly opened trade lines, …) tend to be riskier.

                                                            Return to FAMQ

How do we construct the A/R table?

                                                        Return to FAMQ

This is best demonstrated by an example.

For the historical pattern, let’s take a search for new credit variable, for example, # of Inquiries in last 24 months with 5 bins 0 Inq, 1 Inq, 2-3, 4-5, and 6 or more. The final table is shown below:

# IQ 24 M History Proposed
Bin
Counts
Accepts
Reject
Accept %
Accepts
Reject
Accept %
0 2,000 1,700 300 85% 1,800 200 90%
1 4,000 3,500 500 88% 3,500 500 88%
2-3 4,000 3,000 1,000 75% 3,200 800 80%
4-5 3,000 2,000 1,000 67% 1,900 1,100 63%
6+ 2,000 1,000 1,000 50% 800 1,200 40%
Total 15,000 11,200 3,800 75% 11,200 3,800 75%

Here’s the explanation for this table. During the development time frame, we had 15,000 applicants, 11,200 were accepted and 3,800 were rejected for a 75% accept rate (it doesn’t matter how many were booked). 2,000 applicants had no inquiries and of those 300 were rejected. 4,000 had 1 inquiry and 500 of those were rejected, … . Basically, you need to a cross tab for each of the variables against the Accept/Reject flag. For the Proposed A/R, rank order the population by the new score and then assume that any account that is in the bottom 25% (historical reject rate) is a reject according to the proposed model and anything in the top 75% is an accept.

                                                        Return to FAMQ

What is the Accept/Reject (A/R) pattern?

                                                        Return to FAMQ

The A/R pattern is used in Reject inference (as opposed to all performance inference) to understand how any new decision process (usually a new acquisition model) compares to the historical process. The idea is to look at a number of salient dimensions, known to be important in predicting risk, and compare the new process to the old process on these dimensions.

For example, if the historical process accepted 80% of the applicants with no severe delinquencies and accepted only 20% of the applicants with one severe DQ then we would hope that the new process would accept more than 80% with no DQ and fewer than 20% with one or more.

The following examples may help.
# Trade Lines
History
Proposed
#
Count
Accept
Reject
Accept %
Accept
Reject
Accept %
0-2 TL
2,000
1,000
1,000
50%
500
1,500
25%
2-4 TL
4,000
2,500
1,500
63%
2,200
1,800
55%
5-8 TL
4,000
3,000
1,000
75%
3,600
700
90%
8+
3,000
2,500
500
83%
2,700
200
90%
Total
13,000
9,000
4,000
69%
9,000
4,000
69%

In the above example we can see that in the historical pattern there was higher acceptance rate with more trade lines (breadth of credit experience). In the proposed pattern, the difference is even more dramatic.
# 30 Day DQ
History
Proposed
#
Counts
Accept
Reject
Accept %
Accept
Reject
Accept %
0 7,000 6,000 1,000
86%
6,100 900
87%
1 DQ 4,000 2,700 1,300
68%
2,660 1,340
67%
2 DQ 1,000 200 800
20%
150 850
15%
3+ DQ 1,000 100 900
10%
90 910
9%
Total 13,000 9,000 4,000
69%
9,000 4,000
69%

In the above example, we see that historically, the more DQ’s the fewer accepts. In the proposed process this pattern is a bit more exaggerated.

This pattern comparison should be done for at least two variables in each of the major risk dimensions. (See “Error: Reference source not found”)

The A/R pattern is based on the full TTD population sampled during the development window, it includes the booked loans, the rejects, and those that were accepted (approved) but walked away (turned down the loan). The A/R history pattern looks at those that were accepted and booked AND those that were accepted and walked away vs. the rejects. This process does not look at the Out of Time sample or any performance on the known (booked or active) accounts.

                                                        Return to FAMQ

When is performance inference required?

                                                        Return to FAMQ

Performance inference is almost always required in credit origination models. The guidelines from the OCC (Office of the Comptroller of the Currency) indicate that these models consider the full TTD (Through The Door) population in model development.

In other modeling situations, inference may not be required, or even feasible; however, when possible and time permitting, it is best practice.

                                                        Return to FAMQ

How is performance inference done?

                                                        Return to FAMQ

There are a number of methods for performance inference:
  • Testing – One of the most reliable is to “test” into the unknown market by accepting applications that would be rejected or not normally included in the “known” sample and weighted up to represent the full TTD sample.
    • Advantages
      • Control the number and type of accounts that are accepted.
      • Management of these accounts is the same as they would be if the company actually expanded into this part of the market.
      • Surprises – The results of testing may show that this group is not as risky as expected and may represent a niche market in which the company can excel.
    • Disadvantages
      • Cost, since accounts are being accepted that are riskier than the price justifies. For example, a credit company might accept 10% of the normally rejected accounts at the same price structure as the normally accepted accounts and incur higher losses than the price justifies.
      • Time – It takes a year or two to get a good read on the actual performance in some situations.
  • Surrogate Accounts – Try to look at the unknown accounts and find a record of how those accounts may have behaved at a competitor over the same time frame or look at a generic data set and try to match up the unknown accounts based on other variables such as FICO, demographics, income, …
    • Advantages
      • No uncertain costs – The account performance is with another company so there is no cost other than the fixed cost of buying the information
      • Time – Results are immediately available
    • Disadvantages
      • The account performance was not under the same management conditions so the performance inference may be off.
      • Matching logic may be inexact and difficult to confirm.
  • Analytical Inference – This is a technique using expert knowledge of the domain in which the known model is developed and then “engineering” the model based on the known population to account for any model anomalies or inconsistencies in the inferred population. Analytical Performance Inference is a methodology that helps protect against the “truncation” phenomenon.
    • Advantages
      • No additional data needed
      • Results are immediately available.
      • Current account management strategies are accounted for in the analysis
    • Disadvantages
      • Accuracy is dependent on the quality of the analyst and the tools available.
      • It is a lengthy process requiring multiple iterations with the developer and an experienced reviewer. Even then the results are not guaranteed.
At the end of the performance inference stage, there are a number of items that should be reviewed, depending upon the inference technique being used. The primary goal here is to ensure that the inference results indicate that any “truncation” or other problems with the original data have been ameliorated. The results will include:
  • Specification
  • Validation
  • Known vs. Inferred Bad rates
  • Accept/Reject reports
  • Low-side over-rides
  • Score segment coverage
  • Unmatched rate
  • Bias in match
  • Sample adjustment process using inference
  • Special analysis depending on Inference Methodology used.

                                                            Return to FAMQ

Why is performance inference important?

                                                        Return to FAMQ

Performance inference helps prevent the effects of “truncation.” If there is a serious chance that “truncation” is occurring, then performance inference should be included in the analysis.

In some cases, performance inference is less important than others. The primary importance depends on how the models are going to be used and how likely the inferred population is to be included in the scoring process. Also, how important is it to understand, for business knowledge the inferred population. For example:
  • Credit Risk – If there is a sub group of applicants that will never be considered for credit (recent bankrupts, foreclosures, very low FICO scores …) then they might not need to be scored; however, if there is a reasonable chance that these credit policies might change then knowing how these policy rejects will perform is important for scoring purposes. It is very likely that the current rejects will perform worse than the “known” population.
  • Insurance Risk – There may be an insurance policy that rejects any applicants who have had 5 accidents in the last year. As in credit, this policy might change and if it does it would be important to understand the insurance risk of this group to properly price them for insurance
  • Competition – In both the risk examples above the are groups of applicants that walk away from an offer. This is often because they found a better offer elsewhere. In that case, it helps to understand the performance of these “walk aways” in order to make them offers that might be more competitive.
  • Marketing – If a particular set of products is marketed to a very specific group and there is interest in expanding this market then it would be of interest to better understand the expanded population without going to the expense of actually marketing to that population.
The primary purpose of performance inference is to understand something about populations that the business is currently not serving. In the case of risk there is a lot of potential loss in “testing” into that population. In marketing or being more competitive, the risk is limited to the expense of marketing into a new arena or lowering prices to meet the competition. In both cases, performance inference will help in understanding the population with less expense and time.

                                                        Return to FAMQ

What is “Truncation”?

                                                        Return to FAMQ

“Truncation” is what happens when a sub group is not sampled randomly. Potentially, something in this non-random sample will bias certain variables, or “truncate” the selection, resulting in skewed performance.

Truncation is when a specific portion of the population has incomplete predictive information due to limited selection of that population, such as rejection of a risky population or being turned down by the consumer due to a non-competitive product.

For example, a particular lender may ‘target’ a population with recent bankruptcies, because of their policies, this ‘targeted’ bankrupt population may perform better than the general bankrupt population, resulting in a bankruptcy variable, such as time since last bankruptcy, showing a positive relationship with risk. This is because most of the bad risk bankrupt prospects or applicants were rejected. Thus a model based on this data would score bankruptcies more favorably AND IF the previous policies that were in place to ‘target’ this population were changed it could have disastrous results.

“Truncation” can be a serious problem when there are subgroups that are normally excluded from the “known” population. Often, selected observations will be included that pass some additional criteria.

For example, in a credit situation, sometimes a credit policy will reject applicants with a short term history of severe delinquency; however, selected overrides may be allowed in cases where the applicant has sufficient collateral. This could potentially result in a situation where people with severe DQ have better performance than those that don’t (due to the collateral requirement). Including this pattern in a model, especially if the credit policy is changed, might result in high acceptance rates for people with severe DQ.

                                                        Return to FAMQ

What is Performance Inference?


In many modeling situations there is often a sub group of the target population on which the performance or target is not observed. In most cases, it is important that we understand how these subgroups would have performed had we been able to observe their performance. This process is called performance inference For example:
  • There is a subgroup of Credit applicants who are either turned down (rejected) or walk away (accepted but refuse the offer). What would their credit performance have been had they taken the credit?
  • Prospects who are mailed sales offer, but don’t respond. What would they have bought had they responded?
  • Insurance applicants who are offered an insurance policy but walk away from the offer. What losses would they have incurred had they taken the offer?
These are all examples where performance inference could be important. Essentially, performance inference is a “What if?” exercise. It is typically used to understand the full population under consideration in the business situation of concern:
  • In credit or insurance – The full population is anyone who might apply for a loan or insurance. This is known as the Through The Door (TTD) population.
  • In marketing – The full population is the people for whom the products are intended. This could be limited by:
    • Geography (just people who live in Chicago, IL),
    • Interest (just people who attend the Opera)
    • Behavior (people who fly internationally)

What are the time windows for both the target and predictive variables?


When a model is developed we need to answer the following questions:
  • When will the model score be calculated and used?
  • What historical data is being used at time of score calculation?
  • How far into the future is the model predicting and what time frames are used for model development?
Several times above we have mentioned temporal windows. There are two primary windows in time that need to be considered when defining a development sample. The target window and predictive window. The target window is the length of time, after model scoring, that it takes for performance to be resolved AND the time frame over which this data is collected. The predictive time frame describes the history over which predictive data is collected and the time frame over which that data is collected.


The following image helps demonstrate this concept:


This is best clarified by several examples:
  • Weight example (described in a previous post) where weight gain/loss is predicted over a week:
    • Predictive Time Window – In this case, the only historical data that is being collected is the previous week’s weight gain. All other variables are collected at time of scoring. So, in this case, the past predictive window is 7 days in the past.
    • Scoring – There are a number of variables that are being collected at scoring, so the present scoring window is the time of weigh in.
    • Target Time Window - The 7 day time delay from the present day that an individual’s change in weight is being estimated so the future target window is 7 days after the predictive data is collected. Again, the time frame is 7 days after each observation.
  • Good/Bad Credit – In the credit example we are scoring applicants using information from the application and historical credit information collected at time of scoring. The target is a 0/1 target. A 1 is a Good which is defined as never 30 days delinquent over the next 18 month after an application. A 0 (Bad) is defined as an account that has gone 90 days delinquent or worse at least once in the next 18 months. Indeterminate include all other accounts (ever 30 or 60 days DQ but never 90 days DQ).
    • Predictive Time Window – In this case, the historical data that is being collected is the credit bureau data which includes information on up to 8 years negative information and a lifetime of positive information (length of credit history). So, in this case, the past predictive window is virtually unlimited. The predictive data sample is randomly selected applications in the calendar year 2007.
    • Scoring – There are a number of variables that are being collected at scoring, so the present scoring window is the time of application.
    • Target Time Window – The 18 month time frame starting from the day of application for credit so the future target window is 18 months after the predictive data is collected which makes the target window 2008-2009. (applications from Jan 2007 will have performance measured July as of 2008; apps from Dec 2007 will have performance measured in June of 2009).
  • Response/Non-response – In a marketing direct mail solicitation example, response is defined as receiving an application for membership within 6 weeks after the mail has dropped. The mailing took place in March last week of 2010.
    • Predictive Time Window – In this case, the historical data that is being collected is demographic household data at the time the mail list was sent to the data aggregator, which was 4 weeks prior to the mail drop. So, in this case, the past predictive window is limited to the data available at the time of mailing. The predictive data sample is randomly selected applications from that mailing.
    • Scoring – There are a no additional variables that are being collected at scoring. So the present scoring window is the time the list was determined or 4weeks prior to the mail drop.
    • Target Time Window – The 6 week time frame starting from the day of the mail drop is the future target window. This makes the target window from the last week of March 2010 to the 2nd week of May 2010.

Can a predictive variable be used in implementation?


The last thing a modeler wants to do is deliver a very predictive model that can’t be implemented. There are several issues that might eliminate a variable from use:
  • Unavailability – A variable might be very predictive, but it may not be available in the implementation environment. This is often an issue when the scoring model is executed in real time, as in instant approvals for credit or insurance. Sometimes, this is a situation where the data exists, but the means to deliver it do not exist or are limited. An example of this might be a specialized credit bureau variable (such as an indicator of transactor or revolver of credit cards) that is available in batch mode, but not in real time feeds.
  • Expense – Some data can be predictive, but its predictive value may not be able to justify the cost. Modelers should always know not only that a variable is available for implementation, but also if there is an additional cost involved.
  • Legal – Many variables are expressly prohibited in certain types of models that fall under either state or federal governance. For example, any variable that is indicative of gender or ethnicity. Also, many variables could be used to discriminate other protected classes based on the coefficient of the variable in the model.
  • Customer Service – Many models are legally required to generate “reason codes.” These codes are intended to help the consumer understand why they were either rejected or not given the best offer. These codes are supposed to be related to the variables most responsible for the decision. If these codes are confusing or counterintuitive then customer service may have a difficult time explaining them to consumers who call in asking for clarification.

    For example, if a model coefficient for “Time on Books” indicates that higher values on this variable increases risk and then the consumer reason code could state something like

    too much credit history with this company.

    This could well generate a lot of consumer calls asking not only to explain this, but encouraging them to cancel their account since it implies that is one way to increase their credit rating.

Previewing predictor variables for acceptability is always best practice. Certainly eliminating variables that are obviously problematic is good practice, but identifying potential issues before presenting a final model is prudent.

What are the predictive (independent) variables?


Predictive variables, also known as independent variables, are variables whose values are used in an equation to calculate a value for the target. For example, suppose we wanted to predict a person’s weight in one week based on their current age, gender, height, recent weight change, and waist measurements. In this example, the target variable is weight upon rising before breakfast and the predictive variables are:
  • Current age,
  • Gender,
  • Current Height (in inches),
  • Weight change over the past week in pounds upon rising before breakfast,
  • Current Waist measurement (maximum in inches).

In a modeling project, there are usually a large number of potential predictive variables and only few of them will end up being implemented in the predictive equation. In most cases, predictive equations are predicting some future value or event, so that the predictive variables are either current or historical values. In the above example, the time frame is relatively short so the difference in the predictive values over that time may not be important unless the model is based on a sample of individuals that are on a diet.

The exact definition of the predictive variables is also important. In this example, the time of day of both the weight change and the target values are defined at a given time of day under specific conditions. This is because daily fluctuations in weight can influence the results.

Why is it important to get the 0 and 1 right in a logistic regression?


(WARNING: This post gets into some advance mathematics and it is not expected that the average reader will understand everything in this post. :-)

It is sometimes important as to which category gets assigned to 0 and 1. Not for modeling purposes, but for interpretation of the final score. This is particularly true if logistic regression is being used as the modeling tool. As experienced modelers know, the only difference in the wrong classification of 0 and 1 is that the signs of the logit equation will be reversed.

The usual objective in modeling a binary target is to estimate the probability that a 1 occurs. This is normally done using logistic regression. The logistic regression equation estimates the probability of a 1 using the log odds equation. This logit equation is converted to an exponential form to produce the actual probability; however, the basic underlying equation is a linear form. This form has a linear relationship between the equation and the log of the odds. This results in an easy conversion of the logit score to a predetermined log odds/score relationship.

This relationship is usually defined by a specific odds value at some base score and Points to Double the Odds (PDO) factor. Because of the linear relationship, this results in a constant additive factor and a constant multiplicative factor for the original score.

Mathematically:

Mathematical Definition Example Description
P = Probability of a 1 P = .9 Probability of a 1 (Good account) is 90%, 0 (Bad account) is 10%.
Odds = P/(1-P) Odds = .9/.1 =9 9 Good accounts for every Bad account
Logit = K + ∑(bi*xi) Logit = 0.2+.8x1-5.5x2 Two variable equation, x1 is Number of Trade Lines and x2 is number of delinquent accounts
P = e[K + ∑(bi*xi)]/
(1+ e[K + ∑(bi*xi)])
x1 = 5, x2 = 1
Logit = -1.3
P = e-1.3/(1+ e-1.3) = 0.214
Odds = 0.214/
(1-0.214) = 0.272
Account with 5 Trades, and 1 delinquent trade has a probability of .214 or 21% of these accounts are bad and the resulting odds of being Good .272 to 1 or 27 out of 100 are Good

In the above equation if the number of Trade Lines changes from 5 to 6 the P value goes from .214 to .377 and the odds more than doubles, going from .272 to .607. So an increase of 1 in the Trades slightly more than doubles the odds. This is a linear relationship and that implies that going from 6 to 7 Trade Lines will double the odds again.

Mathematically, it is clear that if we just adjust the X1 coefficient then the equation will produce an exact doubling of the odds. Likewise, adding a constant to the equation can produce a score that has a value of 100 for odds of 10:1.

The reason this adjustment may be important is best explained with an example. For business purposes, let us assume that it takes 10 good accounts to generate enough profit to pay for 1 bad account. Thus, a score of 100 (10 to 1 odds) is the business breakeven point and any account that has a score less than 100 is not profitable. Thus a business person, knowing the PDO and base odds can easily look at a score and understand not only the business implications of that score but also how the business is impacted by changes in the score.

The point of this lengthy discussion is that having the correct 0, 1 definition at the beginning of the model develop makes this score adjustment more straightforward at implementation time

                                                       Return to FAMQ

Why is an indeterminate set important with a binary target?


The main objective of using a binary target in modeling is to understand what distinguishes the two groups. In many cases there is a clear separation between the two groups, but in other cases there may be clear definitions for part of the population, but there can also be borderline cases where the differences are not so clear. An indeterminate set is used when a dichotomous target has a “fuzzy” definitional area. This gives a cleaner separation between the two groups and leads to a better model.

This is best explained by a few examples:
  • Good/Bad Credit targets – In these, very common, situations the definition of a Good us usually straightforward. An account that has never gone delinquent. The Bad definition, in an absolute sense, is whether the account charged off (C/O) or went into default or bankruptcy; however, that definition is often too harsh for several reasons:
    • For a small portfolio or one with few charge off accounts there may not be enough bads to build a model.
    • The cost of collecting on severely delinquent accounts (90+ DPD) is extensive and even if the account cures there is a high probability that it will C/O in the future.
    • Most risk managers will agree that there is some level of severe delinquency before C/O that they would prefer to avoid.

Whatever the Bad definition, it is clear that there is a gap between the Good and Bad definitions that, at a very minimum, helps clarify the definitions. If the Bad = C/O definition is used then how are severe delinquencies defined. They look a lot more like Bad accounts than good accounts. Typically, only accounts that have been once 30DPD might be considered as good. Usually, any account that is “multiple 30DPD+” or worse, but not in the bad definition, is put in the indeterminate set.
  • High/Lo targets – When the target is defined as above or below some threshold then the indeterminate set is easier to define. Philosophically, those observations that are way above the threshold are clearly in the High group and those that are way below are in the low group, but what about those that are near the threshold. Those are the cases that naturally belong in the indeterminate set. The size of this buffer can be defined by the modeling customer, or without guidance use the 10-15% rule.
  • Attrition Targets – One of the typical objectives of attrition models is to identify those accounts that are likely to cancel or let their membership lapse. This score can be used in to take proactive measures to retain these customers. Sometimes there will be unprofitable customers that the business might not be interested in retaining. The indeterminate set could be defined to include these customers. So the definition might be unprofitable customers that either attrite or stay. This definition usually can include inactive customers.

                                                            Return to FAMQ

How to specify a Dichotomous or Binary Target?


Binary targets take on two values, usually specified as a “0” or “1.” The exact definition of “0” and “1” is not crucial to defining a binary target. In many situations, there is a “performance” time frame involved before a target can be accurately defined in the appropriate category.

As listed above there are a number of cases where a binary target is appropriate. In most of these cases it is also best practice to define an “Indeterminate” set (10-15% of sample population). See below for an explanation of this set. In the following examples, are some of the considerations when defining the target:
  • Good/Bad – Good and Bad are usually defined over a specific time frame. This time frame (see performance window below) has to be long enough to identify Bads and give the Goods time to be profitable.
    • Good is usually defined as paid on time as promised over some minimum time frame and is usually easy to define. Common definitions are:
      • “Never delinquent,”
      • “Never more than 30 Days Past Due (DPD)”, which allows for people who are sometimes late (AKA “sloppy payers”) to be defined as good.
    • Bad definition is more difficult and is usually defined with consultation with the business customer for whom the model is being developed. A common question to ask the customer is “If you knew at the time of application that a prospect would reach some level of delinquency (30, 60, 90 DPD) over the performance window would you have accepted that application?” Indeterminate sets are usually defined as “ever 30 DPD” or “multiple times 30 DPD but never worse.” Of course the Good, Indeterminate, and Bad definitions are mutually exclusive and collectively exhaustive; therefore, they are not independent.
  • Response/Non-response – Did they respond to the mailing or other solicitation? This is an easy target to define. For direct mail there needs to be some time frame over which responses are accepted. A well defined indeterminate set is difficult in this environment as there really is no in-between. A possible definition is a response after some time has elapsed, say 30 days, is indeterminate as it is unclear if the response is due to the specific solicitation.
  • Purchase/Not Purchase – Did they buy after responding? Again, how much time until they become a non-purchaser? This is usually and easy definition. The indeterminate set may include those people that bought something, but not enough to be profitable.
  • Attrition/Retention – This has many similarities to the Good/Bad definition. The time frame here is important. As with most of these definitions, they must be made with the modeling customer. The following considerations are pertinent:
    • How long does it take before a customer can be considered retained?
    • How long before the customer is profitable?
    • What if the customer becomes inactive but does not close the account?
  • Fraud/Non-Fraud – Is a particular transaction legitimate or fraudulent. This may often be difficult to determine.
    • In credit card transactions usually the true customer will identify fraud on a stolen or lost card or account take over. Here, the challenge is identifying the beginning of the string of fraudulent transactions.
    • In insurance claims, a true fraud may never be identified if actual criminal prosecution is not pursued.
    • In identity theft it may take years before the victim identifies the fraud.
    • Fraud is usually a rare event and having a large enough sample is often a significant challenge in building a development sample.
    • Hierarchical modeling can sometimes be used to identify highly probable non-fraud transactions and suspect transactions that can be used for indeterminate sets.
  • Above or Below a threshold – Sometimes, an exact prediction on a continuous target is not needed, just whether or not a customer will be above or below some threshold on a continuous target, such as revenue, recovery # transactions… The indeterminate set here is usually easy to define in that it is a small cushion around the threshold. For example, if customer profitability is the target and profitability threshold is $500 revenue over 5 years then:
    • Good = $500+ revenue
    • Indeterminate = $400 to $500
    • Bad = Less than $400 revenue.

                                                        Return to FAMQ

Monday, January 7, 2013

Target Examples:


Dichotomous or Binary: This is a target that takes on two primary states or values:
  • Good/Bad – This is the typical binary target for credit risk. Good is usually defined as paid on time as promised. Details of the specific definition are given in the section below.
  • Response/Non-response – This is a common target for direct mail solicitations or telemarketing. Usually it is defined as someone who responded positively to an offer.
  • Purchase/Not Purchase – This is also a common target for direct mail or telemarketing and is restricted to the responders, classifying them as responding and purchasing or refusing to purchase after a response.
  • Attrition/Retention – In studies of existing costumers (credit card, insurance, phone service, cable, magazine subscribers …), it is often important to be able to predict who will attrite or cancel or lapse/non-renew vs. retain the service.
  • Fraud/Non-fraud – Is a particular transaction legitimate or fraudulent? A transaction could be an insurance claim, a credit card purchase an application for a service, identity theft, …
  • Above or Below a threshold – Sometimes, an exact prediction is not needed, just whether or not a customer will have above or below some threshold on a continuous target, such as revenue, recovery, # transactions, … .
Continuous: This is a target that can take on a wide range of values that have meaningful numeric values (not just numbered classifications):
  • Revenue – How much revenue can a customer be expected to generate over time?
  • Purchase Amount – How much will an individual purchase in one transaction?
  • Losses – How large an insurance loss is a consumer expected to incur over time?
  • Recovery – What percent of a credit loss is likely to be recovered?
  • Miles driven – What are the actual miles an auto insured drives in a year?

                                                            Return to FAMQ

What is the modeling target (dependent variable(s))?


The modeling target, classically referred to as the dependent variable, is the value that is being estimated in the modeling process. Generally, a modeling process results in some mathematical or logical algorithm that has the target value as its outcome. Precise definition of the modeling target is crucial to the success of the project. It also is used in defining the samples that are collected for model development and validation. If this definition is in error then many steps will have to be redone.

In most projects there will be only one target value and that value will be either continuous or dichotomous (binary or zero/one). The type of target often determines the type of analysis that will be done to derive the final scoring algorithm.

On rare occasions, there may be more than one target. There are specialized tools that can deal with multiple targets, but they are the exception and evaluations of these projects must be done on a case by case basis.

                                                        Return to FAMQ

What kind of problem do you have?


The primary focus here is to look at the process of modeling human behavior. Specifically, the focus is on the modeling or categorization of individual observations of human behavior. These observations generally involve some type of prediction about the future. Here are just a few examples of the types of predictions that could be made:
  • Topic of incoming phone calls.
  • Response to outbound telesales.
  • Response to Direct Mail.
  • Purchase given response.
  • Credit Risk of customer making full payment on loans or credit cards.
  • Risk behavior and insurance claim behavior of vehicle operators, renters, and homeowners.
  • Health risks of individuals – likelihood of dying or injury or getting a specific disease.
  • Individual membership in certain lifestyle categories, such as high income, low education, married without children.
  • Income potential of students.
  • Attrition risk of customers or group members.
  • Voting patterns of individual voters.
  • Risk of recidivism for criminals.
  • Graduation success of students.
Classifying your problem into a specific type is just the beginning of the problem. The next step is
  • Defining your problem – There are a wide number of problems that can be solved by statistical modeling. The following is not intended to be comprehensive, but merely a sampling:
    • Prioritization – Prioritization problems encompass a wide range of modeling solutions. The primary objective is to rank order the observations by some objective. For example:
      • Rank order a portfolio of credit card account applications as to the likelihood of identity theft and then target the top 0.1% for further investigation.
      • Rank order loan accounts that are late as to the likelihood of going on to be 30 days past due and then target various accounts for tailored contacts (reminder e-mails for the least likely, reminder phone calls for mid range, and multiple phone calls for the most likely)
      • Rank order incoming phone calls to a catalog company as to the likelihood of a potential sale.
      • Rank order patients for online medical advice as to the potential severity of their symptoms
      • Rank order students as to their likelihood of scoring poorly on standardized tests and then give those at risk extra work and attention.
      • Rank order applicants for auto or homeowners insurance by risk levels and give those with lower risk better rates and coverage, those with higher risk may be directed to special companies or other treatment, such as more frequent review and adjustment of their pricing.
    • Estimates of specific values – Estimation problems require a model that can predict specific values; however, most of these problems can be solved with rank ordering solutions. For any group within a rank ordered solution (i.e. the top 1%), an average value for the target can be calculated and used for predicting a specific value. Usually, for these kind of problems more accuracy is needed than in a rank ordered solution. For example:
      • Predicting the exact weight (±5%) of an individual based on diet, age, height, bone density and body fat.
      • Predicting the MPG of a vehicle based on the design specifications under given driving conditions
      • Predicting the amount of money that can be recovered from a loan that has been charged off (alternatively called LGD or Loss Given Default).
      • Predicting the amount of loss that can be expected for a specific group of insurance policies which helps price these policies.
      • Predicting the arrival time of an airplane flight based on weather, speed, and type of aircraft.
    • Segmentation – Categorization of observations into homogeneous groups. This is various forms of statistical clustering and is often not classified as a modeling problem; but, it has many similarities to traditional modeling: it involves individual observations; sampling is always an issue; and it can be used for a variety of predictive purposes. It also involves a lot of subjective judgement, such as what variables are used for clustering; how is a good cluster defined; how will the categories be used. Examples of segmentation:
      • Lifestyle segmentation – Classifying individuals or households into specific categories that may be described by lifestyle characteristics. This is usually based on census data (age, income, family size, education, ethnicity, housing, and other demographic information derived from the census bureau), survey information, subscription information, and so forth. The primary idea here is to define homogeneous groups for purposes of marketing or surveys.
      • Performance segmentation – Classifying individual accounts or customers into various homogeneous segments based on performance, what they buy, how often the buy, where they buy, when they buy. These classifications can then be used for marketing purposes, retention, potentially for risk evaluation, items to stock or sell, and so forth. Examples of markets where this type of segmentation can be used:
        • Credit Card – Credit Card utilization can be used to group users as Revolvers, Transactors, inactive, balance transfers. Other information, such as merchant SIC can further group users into
        • Frequent buyer cards – Information from frequent buyer
          • retail stores – grocery, hardware, pharmacies, …
          • airlines,
          • online purchasers – Auctions, Books, gadgets, collectibles, …
        • Insurance – Types of insurance, levels of risk, amount of coverage, … Such as high risk vehicle drivers with multiple policies, hurricane prone properties, Other weather, such as hail, tornadoes, blizzards, …
These are just a few of the issues that must be addressed before a modeling project is initiated.
There are several steps in this development, which will be addressed in separate sections:
  • What is the modeling Target (dependent variable(s))?
  • What are the predictive (independent) variables?
  • What are the time windows for both the target and predictive variables?
  • Will there be any performance inference?
  • How to deal with interactions between variables?
  • How will the model be implemented and monitored?
Once the initial design is completed, the next step is to define the samples that will be used during development. There should be two samples used for a rigorous model development process, a development sample and an Out of Time Sample (OTS). Both will be discussed in the sections below.

                                                        Return to FAMQ

How do you approach first steps if you are a client?


As a modeling client it is important to understand the basics behind modeling. First, we need to define what kinds of problems are being addressed in this paper. There are a wide variety of modeling problems.
  • Physical models – These encompass efforts to mathematically model the physical world. These include: biological or bio-medical models wherein an attempt is made to model the underlying biological or chemical interactions of drugs and or chemicals. These also include modeling the physical and/or mechanical world, such as aerodynamics or the weather or geological aspects. These also include queuing models such as the work flow on a factory floor or transactions at a service window or teleprocessing of customer service calls.
  • Financial and economic Models – These models typically try to predict things like how much money will be spent or earned by an entity. These are accounting models. The look at and try to predict interest rates, exchange rates and economic variables such as employment rates, GDP, and so forth.
  • Human behavior models – These encompass efforts to mathematically model the much more complex behavior of humans. These include things like: risk, response, buying, selling, mating, and so forth behavior.
The physical and bio-stats models require intimate theoretical knowledge process being modeled and familiarity with the medical field. This paper does not address the unique aspects of these kind of challenges; although many of the issues addressed herein can be applied to these other kind of problems.

Why are First Steps important? – Defining the Problem


One of the first steps in any modeling project is to define the problem and establish the details of the business objective. Many modelers want to grab some data and just start building a model. Many Clients just want a model because everyone else has one. The problem with this is that if the parameters of the problem are not thoroughly discussed with the business partners then the resultant model may end up with serious flaws. Things to look for:
  • Accuracy - The final model may not have sufficient accuracy to be useful.
    • A model can have some performance accuracy but not enough to justify the expense of implementation and maintenance. It is important to understand what level of improvement is needed by the business customer.
    • The wrong target and or performance window was not accurately defined.
    • Estimate the potential accuracy of the model by researching similar models in the company or industry.
  • Implementation - The model may not be implementable.
    • Variables are used in the model that are very predictive but are not available for implementation.
    • Variables are used in the model that are very predictive but are too expensive for implementation.
    • Systems need to be revised or updated for real-time implementation.
    • Evaluate the potential variables in the final model and check with the implementation team for known and unknown issues.
  • Monitoring - The final model may not be adequately monitored and thus not be able to be subject to live validation.
    • Monitoring is not best practice, it is a requirement in many industries.
    • If a model cannot be monitored then it may not be known if and when it is no longer effective even hazardous to the business

Basic Stats - Why does it matter?

In a previous blog I described some very basic concepts in descriptive stats. Why does this matter to the typical consumer?

One of the most important things a consumer needs is the knowledge that enables them evaluate advertisements, news articles, political speeches, business portfolios, and other information.

Often, these sources of information will use some form of statistics to describe some policy, test result, financial statement or other numerical summary or analytical result. Knowing basic stats will help the consumer to better understand these results and when to question the results.

Examples

The examples presented here cover most of the situations that the ordinary consumer will encounter; however, more advance analysis requires additional insight which may be answered in my Frequently Asked Modeling Questions (FAMQ).

                                                        Return to FAMQ

  • Central Tendency - Question whether a measure of central tendency is using median or mean. Often a result is presented as the average, but is that the average of all numbers (mean) or the value for the average individual (median). This is particularly true for monetary data (income, salary, net worth, price of items, ...) which are often negatively skewed. The mean makes the "average" seem higher and the median makes it look lower.
  • Dispersion - When looking at dispersion it is important to question what it is when it is omitted from a report. Sometimes two populations will be compared by reporting only the  mean or median. These values may differ greatly, but in order to truly compare the two populations you need to know how the values vary. For example, one could be comparing two realty companies based on how long it takes them to sell a property. One could be higher than the other but have a much wider dispersion
  • Samples - Sampling is often biased and not randomly selected. The example above on realtors could have bias by comparing high priced properties for one company and low priced properties for the other firm.