Basic Stats, Advanced Stats, and other stuff: What kind of problem do you have?

Return to FAMQ

The primary focus here is to look at the process of modeling human behavior. Specifically, the focus is on the modeling or categorization of individual observations of human behavior. These observations generally involve some type of prediction about the future. Here are just a few examples of the types of predictions that could be made:

Topic of incoming phone calls.
Response to outbound telesales.
Response to Direct Mail.
Purchase given response.
Credit Risk of customer making full payment on loans or credit cards.
Risk behavior and insurance claim behavior of vehicle operators, renters, and homeowners.
Health risks of individuals – likelihood of dying or injury or getting a specific disease.
Individual membership in certain lifestyle categories, such as high income, low education, married without children.
Income potential of students.
Attrition risk of customers or group members.
Voting patterns of individual voters.
Risk of recidivism for criminals.
Graduation success of students.

Classifying your problem into a specific type is just the beginning of the problem. The next step is

Defining your problem – There are a wide number of problems that can be solved by statistical modeling. The following is not intended to be comprehensive, but merely a sampling:
- Prioritization – Prioritization problems encompass a wide range of modeling solutions. The primary objective is to rank order the observations by some objective. For example:
  - Rank order a portfolio of credit card account applications as to the likelihood of identity theft and then target the top 0.1% for further investigation.
  - Rank order loan accounts that are late as to the likelihood of going on to be 30 days past due and then target various accounts for tailored contacts (reminder e-mails for the least likely, reminder phone calls for mid range, and multiple phone calls for the most likely)
  - Rank order incoming phone calls to a catalog company as to the likelihood of a potential sale.
  - Rank order patients for online medical advice as to the potential severity of their symptoms
  - Rank order students as to their likelihood of scoring poorly on standardized tests and then give those at risk extra work and attention.
  - Rank order applicants for auto or homeowners insurance by risk levels and give those with lower risk better rates and coverage, those with higher risk may be directed to special companies or other treatment, such as more frequent review and adjustment of their pricing.
- Estimates of specific values – Estimation problems require a model that can predict specific values; however, most of these problems can be solved with rank ordering solutions. For any group within a rank ordered solution (i.e. the top 1%), an average value for the target can be calculated and used for predicting a specific value. Usually, for these kind of problems more accuracy is needed than in a rank ordered solution. For example:
  - Predicting the exact weight (±5%) of an individual based on diet, age, height, bone density and body fat.
  - Predicting the MPG of a vehicle based on the design specifications under given driving conditions
  - Predicting the amount of money that can be recovered from a loan that has been charged off (alternatively called LGD or Loss Given Default).
  - Predicting the amount of loss that can be expected for a specific group of insurance policies which helps price these policies.
  - Predicting the arrival time of an airplane flight based on weather, speed, and type of aircraft.
- Segmentation – Categorization of observations into homogeneous groups. This is various forms of statistical clustering and is often not classified as a modeling problem; but, it has many similarities to traditional modeling: it involves individual observations; sampling is always an issue; and it can be used for a variety of predictive purposes. It also involves a lot of subjective judgement, such as what variables are used for clustering; how is a good cluster defined; how will the categories be used. Examples of segmentation:
  - Lifestyle segmentation – Classifying individuals or households into specific categories that may be described by lifestyle characteristics. This is usually based on census data (age, income, family size, education, ethnicity, housing, and other demographic information derived from the census bureau), survey information, subscription information, and so forth. The primary idea here is to define homogeneous groups for purposes of marketing or surveys.
  - Performance segmentation – Classifying individual accounts or customers into various homogeneous segments based on performance, what they buy, how often the buy, where they buy, when they buy. These classifications can then be used for marketing purposes, retention, potentially for risk evaluation, items to stock or sell, and so forth. Examples of markets where this type of segmentation can be used:
    - Credit Card – Credit Card utilization can be used to group users as Revolvers, Transactors, inactive, balance transfers. Other information, such as merchant SIC can further group users into
    - Frequent buyer cards – Information from frequent buyer
      - retail stores – grocery, hardware, pharmacies, …
      - airlines,
      - online purchasers – Auctions, Books, gadgets, collectibles, …
    - Insurance – Types of insurance, levels of risk, amount of coverage, … Such as high risk vehicle drivers with multiple policies, hurricane prone properties, Other weather, such as hail, tornadoes, blizzards, …

These are just a few of the issues that must be addressed before a modeling project is initiated.

There are several steps in this development, which will be addressed in separate sections:

What is the modeling Target (dependent variable(s))?
What are the predictive (independent) variables?
What are the time windows for both the target and predictive variables?
Will there be any performance inference?
How to deal with interactions between variables?
How will the model be implemented and monitored?

Once the initial design is completed, the next step is to define the samples that will be used during development. There should be two samples used for a rigorous model development process, a development sample and an Out of Time Sample (OTS). Both will be discussed in the sections below.

Return to FAMQ

Basic Stats, Advanced Stats, and other stuff

Monday, January 7, 2013

What kind of problem do you have?

No comments:

Post a Comment