(WARNING: This post gets into some advance mathematics and it is not expected that the average reader will understand everything in this post. :-)
It is sometimes important as to which category gets assigned to 0 and 1. Not for modeling purposes, but for interpretation of the final score. This is particularly true if logistic regression is being used as the modeling tool. As experienced modelers know, the only difference in the wrong classification of 0 and 1 is that the signs of the logit equation will be reversed.
It is sometimes important as to which category gets assigned to 0 and 1. Not for modeling purposes, but for interpretation of the final score. This is particularly true if logistic regression is being used as the modeling tool. As experienced modelers know, the only difference in the wrong classification of 0 and 1 is that the signs of the logit equation will be reversed.
The usual objective in modeling a
binary target is to estimate the probability that a 1 occurs. This is
normally done using logistic regression. The logistic regression
equation estimates the probability of a 1 using the log odds
equation. This logit equation is converted to an exponential form to
produce the actual probability; however, the basic underlying
equation is a linear form. This form has a linear relationship
between the equation and the log of the odds. This results in an easy
conversion of the logit score to a predetermined log odds/score
relationship.
This relationship is usually defined by
a specific odds value at some base score and Points to Double the
Odds (PDO) factor. Because of the linear relationship, this results
in a constant additive factor and a constant multiplicative factor
for the original score.
Mathematically:
Mathematical Definition | Example | Description |
P = Probability of a 1 | P = .9 | Probability of a 1 (Good account) is 90%, 0 (Bad account) is 10%. |
Odds = P/(1-P) | Odds = .9/.1 =9 | 9 Good accounts for every Bad account |
Logit = K + ∑(bi*xi) | Logit = 0.2+.8x1-5.5x2 | Two variable equation, x1 is Number of Trade Lines and x2 is number of delinquent accounts |
P = e[K
+ ∑(bi*xi)]/
(1+ e[K +
∑(bi*xi)]) |
x1 = 5, x2 = 1
Logit = -1.3
P = e-1.3/(1+ e-1.3)
= 0.214
Odds = 0.214/
(1-0.214) = 0.272 |
Account with 5 Trades, and 1 delinquent trade has a probability of .214 or 21% of these accounts are bad and the resulting odds of being Good .272 to 1 or 27 out of 100 are Good |
In the above equation if the number of
Trade Lines changes from 5 to 6 the P value goes from .214 to .377
and the odds more than doubles, going from .272 to .607. So an
increase of 1 in the Trades slightly more than doubles the odds. This
is a linear relationship and that implies that going from 6 to 7
Trade Lines will double the odds again.
Mathematically, it is clear that if we
just adjust the X1 coefficient then the equation will
produce an exact doubling of the odds. Likewise, adding a constant to
the equation can produce a score that has a value of 100 for odds of
10:1.
The reason this adjustment may be
important is best explained with an example. For business purposes,
let us assume that it takes 10 good accounts to generate enough
profit to pay for 1 bad account. Thus, a score of 100 (10 to 1 odds)
is the business breakeven point and any account that has a score less
than 100 is not profitable. Thus a business person, knowing the PDO
and base odds can easily look at a score and understand not only the
business implications of that score but also how the business is
impacted by changes in the score.
The point of this lengthy discussion is
that having the correct 0, 1 definition at the beginning of the model
develop makes this score adjustment more straightforward at
implementation time
Return to FAMQ
Return to FAMQ
No comments:
Post a Comment