Predictive Modeling Tutorial - Interpreting the Results

Created by Steve Hoover, Modified on Fri, Aug 16 at 4:55 PM by Steve Hoover

Interpreting the Predictive Model Results

The following results are from the OfficeStar Tutorial data set that loads automatically when you select the Tutorial link in the Enginius Dashboard and run with analysis parameters indicated in Running a Predictive Modeling Analysis article. 


Confusion matrix

The confusion matrix section of the report assesses the model performance. The confusion matrix contains two matrices of the same data: numerical counts and percentages.



The diagonal elements of both matrices indicate the convergence of the actual (observed) and predicted data.  High values on diagonal represent a high correlation between observed and predicted behaviors (called hit rate).  You can also use the confusion matrix to compute indices called “Recall” and “Precision” which also provide additional metrics for evaluating a model’s performance. 


Model predictions

The model predictions table shows how well the model compares to actual results.



Gain chart and lift

A gain chart is a useful representation of how good a predictive model is at identifying the most favorable responses (e.g., “Yes” in the above table). 



The x-axis represents the population ordered in decreasing order of choice likelihood of the favorable response, and the y-axis represents proportion of the total number of favorable choices. The diagonal on the Gain chart represents the predictions of a model that predicts the choices of each individual randomly, and the red line predicts the choices based on actual data (i.e., Truth). The other two lines predict the performance of the choice model.  The choice model’s performance gets better as the green lines (representing the model) depart from the performance of the random model, and approach the predictions based on truth (observed data).  When we reach 100% of the ordered list, all models recover fully the total number of favorable choices.


The dashed green line represents the gain chart obtained on the entire calibration data, without cross-validation, whereas the green area represents the same obtained by cross-validation. The latter sometimes provides degraded but more realistic performance results that reduces the influence of outliers in the calibration data.


Lift is defined as the improvement in model performance at different percentile levels of the ordered list of the population depicted on the horizontal axis. If by selecting the top 10% of the ordered list, we can reach 18.7% of the individuals who make the appropriate choice (i.e., respond favorably), the focal model performs 1.87 times better than a model that makes random assignments. In that case, the lift at the 10-percentile level is 1.87.


The 'truth' is the true number of favorable responses in the ordered list. Improvement defines how well the truth is recovered by the model. An improvement of 100% means that all the favorable responses were recovered perfectly.



Elasticities

Elasticity is a measure of how responsive a target variable is to a change in the value of a predictor. Specifically, elasticity is defined as a ratio of percentage change in the target variable (Y) in response to a specified % change in a predictor (X), so that Elasticity = (% change in Y) / (% change in X).


To compute the elasticities, Enginius follows these steps:

  1. Predict the target variable Y for each individual at the current values of X for each individual.  Average Y across respondents to obtain Y0.
  2. Increase the values of X (for each observation) by 1% and predict the target variable Y at these new values. Average across respondents to obtain Y1.
  3. Compute elasticities as (Y1 – Y0) / Y0.

Keep in mind that, when X is discrete, an increase of 1% in X is meaningless. For instance, if X = 1 means that the color is red, X = 1.01 has no useful interpretation and the elasticity computations do not lead to interpretable results.


Here is an example of elasticities computed by Enginius:


It does not make sense to interpret the elasticity of Gender.  For Age, these results suggest that if the age of everyone increases by 1% from their current ages (someone a bit older), the overall probability of choosing alternative “1” will increase by 0.34%, and the overall probability of choosing alternative “0” will decrease by 0.29%.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons

Feedback sent

We appreciate your effort and will try to fix the article