Inductis
Who We Are

Who we are

Seven Steps to a Super Model
Krishna Mehta, Inductis Inc.

Introduction

The use of high end analytics is on the rise in the marketplace today. For example in the financial sector, statistical models are increasingly used to drive decisions during all stages of the customer life cycle. Given the prevalent competitiveness, firms are always looking to extract better performance from these models. Model performance is usually measured in terms of popular metrics like model lift. In addition, models are also evaluated on their shelf life. In this article we describe some steps that when followed during model development can lead to superior performance and greater shelf life for the model.

The Seven Steps

Step1: Consider the number of variables when sampling

It is important to consider the number of variables available when selecting a modeling sample. Our research has shown that the optimal sample size for model development, measured in terms of model performance, is influenced by the number of variables available for model development. A population with more variables will need a larger modeling sample.

Step 2: Perform Advanced Data Prep

It is common practice in most modeling exercises to perform outlier and missing imputation at the beginning of the study . Usually at this stage the modeling approach is not determined and data prep is performed independent of the technique to be used later. However, this is a sub optimal approach. Let us first take the outliers. Our research has shown that some outlier techniques are more appropriate for certain modeling problems. For example, outlier treatment using mahalanobis distance improves model performance in linear regressions but more traditional capping/flooring type techniques do better for logistic regressions. Therefore, the common practice of treating outliers independent of modeling approach is not optimal. Similarly, the most common way to treat missing data is to impute it by some populations characteristic like mean or median. However, such techniques artificially reduce the variance in the modeling dataset and consequently bias the coefficients obtained from the model. At Inductis, we have found that advanced techniques like multiple imputation can correct for this bias. However, multiple imputation leads to inconsistencies when used in conjunction with popular software driven automatic variable selection methods deployed during the modeling process. A solution is to re estimate missing data with multiple imputation after a preliminary model has been developed using more standard techniques, and automatic variable selection is no longer required.

Step 3: Create Smart Variables

Interactions among key variables are often very important in explaining the event of interest. However, more often than not the relationship is complex and non linear and its functional form cannot be detected by simple profiling techniques. Advanced techniques like Classification Trees, Regression Splines and Genetic Algorithms can be used to automatically detect these interactions, New variables representing these interactions can then be created and used to improve model performance.

Step 4: Reduce the Dimension Properly

It is fairly common to have situations where thousands of variables are available at the beginning of an analysis. It is hard to perform any meaningful analysis with so many variables and the first step is to reduce the dimension of the problem. This is a tricky situation, because it is very easy to remove important variables by mistake and cause the subsequent model development to be sub optimal. A technique which involves variable clustering and classification trees can be very effective in dimension reduction problems.

Step 5:Correct for interdependence among your explanatory variables

If the explanatory variables in a model are highly correlated, then the coefficients obtained from the model become arbitrary and the test statistics become unreliable. This problem is magnified by the fact that most automatic variable selection procedures are based on these test statistics and so become unreliable. Therefore care should be taken to handle the co-linearity among explanatory variables upfront. This can be done by deleting one variable from each of the highly correlated pairs. Relationship with the dependant variable and business context can be used to determine the explanatory variable to be deleted. This approach can be extended to clusters of highly correlated variables.

Step 6: Stabilize the results

A stable model has a greater shelf life. Therefore, once the model has been developed, special effort should be made to make the model stable, i.e. ensure it gives consistent results across different samples and over time. Out of sample and out of time validation help make models stable. In addition advanced techniques like bootstrapping, coefficient blasting and sensitivity analysis can help a lot in the exercise. Bootstrapping involves re-estimating the model over numerous randomly drawn sub-samples. Sensitivity Analysis involves checking stability of model score for small changes in predictor values. Coefficient Blasting follows from the bootstrap method and involves looking at the distribution of coefficients of key variables across the sub-samples.

Step 7: Meta Model to the Super Model

Often during the modeling exercise, competing models are developed and the best performing model is selected. It is however a good idea to study the models to see if some do better for certain segments of the population. In such cases, it is possible to combine the models and improve the results obtained. We have found that it is fairly common for the super models derived using the steps described above to increase model performance by an additional 400 to 500 basis points.

Conclusions

The use of sophisticated modeling can add great value to the business decision process. However, model performance and stability are always a cause for concern. We have laid out seven steps, which when properly incorporated can immensely improve the quality of the model. Practitioners should adopt these steps to convert their models into super models. These measures are an integral part of the MicroAnalytixTM process developed and deployed by Inductis.

Quick Links for Financial and Insurance Consulting Services and More...
Apply For Insurance Consulting Services-Inductis

APPLY TO INDUCTIS

Inductis - Focusing On Professional Financial Consulting & Insurance Services
FOCUS AREAS
Case Study of Best Financial Consulting Services & Insurance Consulting-Inductis and More...
CASE STUDIES
  Select examples of how Inductis teams have achieved results for a variety of clients ...more >>
Best Financial Consulting Company- Inductis
PUBLICATIONS
  Our thoughts on how organizations can elevate their performance ...more >>
Site Map -Inductis
SITE MAP
Contact Us for Financial Services and Insurance Consulting Services - Inductis
CONTACT US
Copyright © 2002 - 2008 Inductis Inc.