When a model is developed we need to answer the following questions:
- When will the model score be calculated and used?
- What historical data is being used at time of score calculation?
- How far into the future is the model predicting and what time frames are used for model development?
Several times above we have mentioned temporal windows. There are two primary windows in time that need to be considered when defining a development sample. The target window and predictive window. The target window is the length of time, after model scoring, that it takes for performance to be resolved AND the time frame over which this data is collected. The predictive time frame describes the history over which predictive data is collected and the time frame over which that data is collected.
This is best clarified by several examples:
- Weight example (described in a previous post) where weight gain/loss is predicted over a week:
- Predictive Time Window – In this case, the only historical data that is being collected is the previous week’s weight gain. All other variables are collected at time of scoring. So, in this case, the past predictive window is 7 days in the past.
- Scoring – There are a number of variables that are being collected at scoring, so the present scoring window is the time of weigh in.
- Target Time Window - The 7 day time delay from the present day that an individual’s change in weight is being estimated so the future target window is 7 days after the predictive data is collected. Again, the time frame is 7 days after each observation.
- Good/Bad Credit – In the credit example we are scoring applicants using information from the application and historical credit information collected at time of scoring. The target is a 0/1 target. A 1 is a Good which is defined as never 30 days delinquent over the next 18 month after an application. A 0 (Bad) is defined as an account that has gone 90 days delinquent or worse at least once in the next 18 months. Indeterminate include all other accounts (ever 30 or 60 days DQ but never 90 days DQ).
- Predictive Time Window – In this case, the historical data that is being collected is the credit bureau data which includes information on up to 8 years negative information and a lifetime of positive information (length of credit history). So, in this case, the past predictive window is virtually unlimited. The predictive data sample is randomly selected applications in the calendar year 2007.
- Scoring – There are a number of variables that are being collected at scoring, so the present scoring window is the time of application.
- Target Time Window – The 18 month time frame starting from the day of application for credit so the future target window is 18 months after the predictive data is collected which makes the target window 2008-2009. (applications from Jan 2007 will have performance measured July as of 2008; apps from Dec 2007 will have performance measured in June of 2009).
- Response/Non-response – In a marketing direct mail solicitation example, response is defined as receiving an application for membership within 6 weeks after the mail has dropped. The mailing took place in March last week of 2010.
- Predictive Time Window – In this case, the historical data that is being collected is demographic household data at the time the mail list was sent to the data aggregator, which was 4 weeks prior to the mail drop. So, in this case, the past predictive window is limited to the data available at the time of mailing. The predictive data sample is randomly selected applications from that mailing.
- Scoring – There are a no additional variables that are being collected at scoring. So the present scoring window is the time the list was determined or 4weeks prior to the mail drop.
- Target Time Window – The 6 week time frame starting from the day of the mail drop is the future target window. This makes the target window from the last week of March 2010 to the 2nd week of May 2010.