The ability to communicate over a distance is such an important freedom that the founding fathers found fit to include it in the original constitution. To wit:
Article I, Section 8, Clause 7 of the United States Constitution, known as the Postal Clause or the Postal
Power, empowers Congress"To establish Post Offices and post Roads".
The Postal Clause was added to the Constitution primarily to facilitate interstate communication, as well as to
create a source of revenue for the early United States.
There is no way that the founding fathers could have foreseen the technological advances that have created the telephone, telegraph, and the Internet; however, I contend that had they foreseen this progress they may well have felt that these forms of communication be established by the government.
In fact, when the telephone first came on the scene, AT & T was granted a nationwide monopoly in return for regulation by the federal government, as many local government did with local telephone and other utilities. In return for this monopoly and regulation, these industries promised to supply everyone with electricity, telephone and other utilities.
Now we have Internet, which has seemingly become so ubiquitous that hi-speed access is almost a necessity to function in today's society. Unfortunately, there are no regulations that force the suppliers of broadband access that they serve everyone.
This has resulted in a small minority that is left out of modern day communication
Basic Stats, Advanced Stats, and other stuff
Blogs on Basic Statistics, Advanced Statistics and opinions by a professional statistician and philosopher.
Tuesday, August 16, 2016
Friday, February 15, 2013
FAMQ - Frequently Asked Modeling Questions
This blog is a list of Modeling Questions and links to other posts that have detailed answers to these questions.
In addition, these questions can serve as a reference for new modelers who may just have some basic questions on the modeling process. How to get started? How to plan a project?
FAMQ:
Why are First Steps important? – Defining the Problem
How do you approach first steps if you are a client?
What kind of problem do you have?
What is the modeling target (dependent variable(s))?
What are predictive (independent) variables
Can a predictive variable be used in implementation?
What are the time windows for both the target and predictive variables?
What is Performance Inference?
During the Model Development and Review process a number of issues may arise that may best be addressed proactively. This document is intended to address some of the more common issues that tend to come to the forefront during the model review process. Many of these items may be subject to the personal opinions of the individual reviewer; however, they are all valid issues. A resolution process should be developed to resolve any conflicts.
In addition, these questions can serve as a reference for new modelers who may just have some basic questions on the modeling process. How to get started? How to plan a project?
FAMQ:
Why are First Steps important? – Defining the Problem
How do you approach first steps if you are a client?
What kind of problem do you have?
What is the modeling target (dependent variable(s))?
What are predictive (independent) variables
Can a predictive variable be used in implementation?
What are the time windows for both the target and predictive variables?
What is Performance Inference?
- What is “Truncation”?
- Why is performance inference important?
- How is performance inference done?
- When is performance inference required?
- What is the Accept/Reject (A/R) pattern?
- Why reject inference is better than a KGB( Known Good/Bad) model?
- Given all this a priori knowledge, what can be done to reflect this in a model based only on the KGB data?
Why does parceling work?
Return to FAMQ
Parceling works because of the nature
of logistic regression. Logistic regression essentially is modeling
the linear relationship between the independent variables and the
natural log of the odds of 1 to 0. If we look at the process of
duplicating observations and giving them weights proportional to
their “1-ness” or “0-ness” then the 1/0 odds become the Loss
Ratio or % recovery or Profit/Cost ratio.
One advantage of modeling this
continuous target using logistic regression instead of linear
regression is that it simplifies the modeling assumptions on the
distribution of the target ratio since the kind of ratios discussed
here often quite skewed due to a high percentage of 0 or small values
and a few very large values. This technique and also has added power
to the “goodness” or “badness” for the true 0’s and 1’s
that may be ignored in a linear regression of a ratio.
For example:
- Loan Collections – In a charged off loan with no recoveries the recovery % is zero regardless of how much was owed. In the parceling technique, this “Bad” account is extra bad if the amount owed was high, but not nearly as bad if the amount owed was small.
- Insurance Risk –When an insurance policy has a no claims it is Good, but it’s Loss Ratio is 0 regardless of how much premium is paid. In the parceling technique, that account is extra “Good” if the premium paid is large and not so good when the premium paid is small. Likewise, when a claim is paid that is large then that account is extra “Bad.”
- Profitability – If an account in any sort of business has no revenue, then the revenue cost ratio is 0 regardless as to how much that account has cost the company. With parceling a zero revenue account is extra bad if its associated costs are high and not so bad if its costs are low.
Return to FAMQ
What is Parceling?
Return to FAMQ
Parceling is a technique often used in
analytical performance inference to help account for a “fuzzy” or
probabilistic 0/1 outcome for a given observation It is also used in
model development as an alternative to linear regression for to
convert a ratio variable to a 0/1 or binary target. For example:
The problem in inference is to
determine how a specific “unknown” observation would have
performed had it been in the known population.
- Lending or credit
- How would a rejected applicant have performed had it been accepted?
- How would a “unbooked” (accepted but walked away) applicant have performed had they taken the loan.
- Direct Mail
- Would a potential customer have responded had they been mailed an offer?
- If they had responded, would they have purchased something?
- Fraud
- Would a credit application been identified as fraudulent had it been investigated?
- Would an insurance claim been identified as fraudulent had it been investigated?
In this situation, the analysis of the
known population can be extrapolated (very carefully) into the
unknown population to derive a probability of the target performance
(for example 1=Good or 0=Bad). This probability is then used to
divide an unknown observation into two separate observations, a Good
observation and a Bad observation. The good observation is given a
weight equivalent to the probability of a 1 and the Bad observation
is given a weight equivalent to the probability of 0.
This technique is only applicable in
some very specific conditions. In general, if the target can have
different degrees of “Badness” or “Goodness” then parceling
can be used. For example:
- Loan Collections – When a loan has been charged off, some or all of the money can be recovered. If none is recovered it is Bad, if all is recovered, it is Good but if a portion is recovered then it is partially good and partially bad. The ratio here is % recovered.
- Insurance Risk –When an insurance policy has a no claims it is Good, when it has claims it is somewhat Bad, depending on how large the claim is. The ratio here is Loss Ratio, or Loss/(Premiums Paid).
- Profitability – An account in any sort of business could be classified as Good or Bad depending on the revenue generated from the account compared to the costs associated with the account. Those accounts with no costs are Good, those accounts with no revenue are Bad. The ratio here is Revenue/Cost.
Parceling is used in these ratio
examples in a similar way to the inference solution. Each partial
observation is duplicated making a “Good” observation and a “Bad”
observation. The Good observations are given weights proportional to
their “Goodness” ($ recovered on the charged off loan, insurance
premiums paid, revenue generated by the acount) and the Bad
observations are given weights proportional to their “Badness” ($
owed on the charged off loan, losses on the insurance due to
claim(s), costs associated with the account).
For inference situations, parceling
allows a single observation with unknown performance to be split into
a “good” observation with a weight proportional to the estimated
probability that observation would have been good and a :bad”
observation with a weight proportional to the estimated probability
that observation would have been “bad.” These parceled values are
then added to the known population to build a final model based on
the full TTD population.
Return to FAMQ
Return to FAMQ
Tuesday, January 8, 2013
How does the modeler know the engineering works?
Return to FAMQ
Without some form of confirmed
performance, the modeler needs to rely on a SME (Subject Matter
Expert) to judgmentally validate the inferred performance. The
primary way to do this with acquisition models (underwriting
decisions for credit) is to compare the Accept/Reject decision using
the model to the historical A/R patterns. If the model decisions are
an improvement (judgmentally) over the traditional A/R pattern then
the inference passes this test. Of course, the inferred model should
still perform well on the known accounts, but not necessarily better
than some known metric.
Return to FAMQ
Return to FAMQ
What is model “engineering”?
Engineering takes to form of forcing the model to include variables (and patterns within variables) that the data does not support. With traditional statistical techniques (various forms of regression) this is very difficult without ignoring the data altogether and using a judgemental model; but, MB (Model Builder from FICO) there are several possibilities just for this use.
The modeler can force variables not only to be in the model but have a much larger effect than the data dictates while optimizing around these constraints. The modeler can assign weights (score values) to specific bins
Return to FAMQ
The modeler can force variables not only to be in the model but have a much larger effect than the data dictates while optimizing around these constraints. The modeler can assign weights (score values) to specific bins
Return to FAMQ
Given all this a priori knowledge, what can be done to reflect this in a model based only on the KGB data?
Return to FAMQ
As alluded to in the previous section,
we can “engineer” the KGB model to, as a minimum, generate
scores that account directionally for this a priori knowledge. This
engineering will necessarily degrade the performance of the KGB model
in the “known” space, but it will ensure that when the model is
used in the “reject” space it will appropriately generate
Accept/Reject (A/R) patterns that reflect our a priori knowledge.
Return to FAMQ
Return to FAMQ
Subscribe to:
Posts (Atom)