Basic Stats, Advanced Stats, and other stuff

Tuesday, August 16, 2016

Communications and Government

The ability to communicate over a distance is such an important freedom that the founding fathers found fit to include it in the original constitution. To wit:

Article I, Section 8, Clause 7 of the United States Constitution, known as the Postal Clause or the Postal
Power, empowers Congress"To establish Post Offices and post Roads".

The Postal Clause was added to the Constitution primarily to facilitate interstate communication, as well as to
create a source of revenue for the early United States.

There is no way that the founding fathers could have foreseen the technological advances that have created the telephone, telegraph, and the Internet; however, I contend that had they foreseen this progress they may well have felt that these forms of communication be established by the government.

In fact, when the telephone first came on the scene, AT & T was granted a nationwide monopoly in return for regulation by the federal government, as many local government did with local telephone and other utilities. In return for this monopoly and regulation, these industries promised to supply everyone with electricity, telephone and other utilities.

Now we have Internet, which has seemingly become so ubiquitous that hi-speed access is almost a necessity to function in today's society. Unfortunately, there are no regulations that force the suppliers of broadband access that they serve everyone.

This has resulted in a small minority that is left out of modern day communication

Friday, February 15, 2013

FAMQ - Frequently Asked Modeling Questions

This blog is a list of Modeling Questions and links to other posts that have detailed answers to these questions.

During the Model Development and Review process a number of issues may arise that may best be addressed proactively. This document is intended to address some of the more common issues that tend to come to the forefront during the model review process. Many of these items may be subject to the personal opinions of the individual reviewer; however, they are all valid issues. A resolution process should be developed to resolve any conflicts.

In addition, these questions can serve as a reference for new modelers who may just have some basic questions on the modeling process. How to get started? How to plan a project?

FAMQ:

Why are First Steps important? – Defining the Problem

How do you approach first steps if you are a client?

What kind of problem do you have?

What is the modeling target (dependent variable(s))?

What are predictive (independent) variables

Can a predictive variable be used in implementation?

What are the time windows for both the target and predictive variables?

What is Performance Inference?

How do we construct the A/R table?

What is model “engineering”?

How does the modeler know the engineering works?

What is parceling?

Why does parceling work?

Why does parceling work?

Return to FAMQ

Parceling works because of the nature of logistic regression. Logistic regression essentially is modeling the linear relationship between the independent variables and the natural log of the odds of 1 to 0. If we look at the process of duplicating observations and giving them weights proportional to their “1-ness” or “0-ness” then the 1/0 odds become the Loss Ratio or % recovery or Profit/Cost ratio.

One advantage of modeling this continuous target using logistic regression instead of linear regression is that it simplifies the modeling assumptions on the distribution of the target ratio since the kind of ratios discussed here often quite skewed due to a high percentage of 0 or small values and a few very large values. This technique and also has added power to the “goodness” or “badness” for the true 0’s and 1’s that may be ignored in a linear regression of a ratio.

For example:

Loan Collections – In a charged off loan with no recoveries the recovery % is zero regardless of how much was owed. In the parceling technique, this “Bad” account is extra bad if the amount owed was high, but not nearly as bad if the amount owed was small.
Insurance Risk –When an insurance policy has a no claims it is Good, but it’s Loss Ratio is 0 regardless of how much premium is paid. In the parceling technique, that account is extra “Good” if the premium paid is large and not so good when the premium paid is small. Likewise, when a claim is paid that is large then that account is extra “Bad.”
Profitability – If an account in any sort of business has no revenue, then the revenue cost ratio is 0 regardless as to how much that account has cost the company. With parceling a zero revenue account is extra bad if its associated costs are high and not so bad if its costs are low.
Return to FAMQ

What is Parceling?

Return to FAMQ

Parceling is a technique often used in analytical performance inference to help account for a “fuzzy” or probabilistic 0/1 outcome for a given observation It is also used in model development as an alternative to linear regression for to convert a ratio variable to a 0/1 or binary target. For example:

Inference

The problem in inference is to determine how a specific “unknown” observation would have performed had it been in the known population.

Lending or credit
- How would a rejected applicant have performed had it been accepted?
- How would a “unbooked” (accepted but walked away) applicant have performed had they taken the loan.
Direct Mail
- Would a potential customer have responded had they been mailed an offer?
- If they had responded, would they have purchased something?
Fraud
- Would a credit application been identified as fraudulent had it been investigated?
- Would an insurance claim been identified as fraudulent had it been investigated?

In this situation, the analysis of the known population can be extrapolated (very carefully) into the unknown population to derive a probability of the target performance (for example 1=Good or 0=Bad). This probability is then used to divide an unknown observation into two separate observations, a Good observation and a Bad observation. The good observation is given a weight equivalent to the probability of a 1 and the Bad observation is given a weight equivalent to the probability of 0.

Ratios

This technique is only applicable in some very specific conditions. In general, if the target can have different degrees of “Badness” or “Goodness” then parceling can be used. For example:

Loan Collections – When a loan has been charged off, some or all of the money can be recovered. If none is recovered it is Bad, if all is recovered, it is Good but if a portion is recovered then it is partially good and partially bad. The ratio here is % recovered.
Insurance Risk –When an insurance policy has a no claims it is Good, when it has claims it is somewhat Bad, depending on how large the claim is. The ratio here is Loss Ratio, or Loss/(Premiums Paid).
Profitability – An account in any sort of business could be classified as Good or Bad depending on the revenue generated from the account compared to the costs associated with the account. Those accounts with no costs are Good, those accounts with no revenue are Bad. The ratio here is Revenue/Cost.

Parceling is used in these ratio examples in a similar way to the inference solution. Each partial observation is duplicated making a “Good” observation and a “Bad” observation. The Good observations are given weights proportional to their “Goodness” ($ recovered on the charged off loan, insurance premiums paid, revenue generated by the acount) and the Bad observations are given weights proportional to their “Badness” ($ owed on the charged off loan, losses on the insurance due to claim(s), costs associated with the account).

For inference situations, parceling allows a single observation with unknown performance to be split into a “good” observation with a weight proportional to the estimated probability that observation would have been good and a :bad” observation with a weight proportional to the estimated probability that observation would have been “bad.” These parceled values are then added to the known population to build a final model based on the full TTD population.
Return to FAMQ

Tuesday, January 8, 2013

How does the modeler know the engineering works?

Return to FAMQ

Without some form of confirmed performance, the modeler needs to rely on a SME (Subject Matter Expert) to judgmentally validate the inferred performance. The primary way to do this with acquisition models (underwriting decisions for credit) is to compare the Accept/Reject decision using the model to the historical A/R patterns. If the model decisions are an improvement (judgmentally) over the traditional A/R pattern then the inference passes this test. Of course, the inferred model should still perform well on the known accounts, but not necessarily better than some known metric.

Return to FAMQ

What is model “engineering”?

Return to FAMQ

Engineering takes to form of forcing the model to include variables (and patterns within variables) that the data does not support. With traditional statistical techniques (various forms of regression) this is very difficult without ignoring the data altogether and using a judgemental model; but, MB (Model Builder from FICO) there are several possibilities just for this use.

The modeler can force variables not only to be in the model but have a much larger effect than the data dictates while optimizing around these constraints. The modeler can assign weights (score values) to specific bins

Return to FAMQ

Given all this a priori knowledge, what can be done to reflect this in a model based only on the KGB data?

Return to FAMQ

As alluded to in the previous section, we can “engineer” the KGB model to, as a minimum, generate scores that account directionally for this a priori knowledge. This engineering will necessarily degrade the performance of the KGB model in the “known” space, but it will ensure that when the model is used in the “reject” space it will appropriately generate Accept/Reject (A/R) patterns that reflect our a priori knowledge.

Return to FAMQ