Box-Cox transform the predictors
In predictive models, the distribution of some variables might be highly skewed. Typically, the number of past customers' transactions or past purchases will be skewed: Many customers have made just 1 purchase in the past, but many others have made approximately 10 purchases, and a handful have made 100 purchases or more. The same problem will often happen with purchase amounts, income, etc.
Since many predictive models (linear and logistic regressions) work best when predictors and target variables follow a more Normal-like distribution, the Box-Cox transformation will re-compute skewed variables so that they become more balanced.
A Box-Cox transformation will automatically transform a variable X into a new variable Y. Even though there is an assignment form every X to Y (i.e., X -> Y), the same may not be true for Y -> X. For this reason, while a Box-Cox transform can be applied to predictors, it cannot be applied to the target variable. In the case of target variables, only log-transforms are available.
Log transform the target variable
When using a Continuous or Discrete-continuous target variable. The log transformation can be used to make highly skewed distributions less skewed. This can be valuable for making patterns in the data more interpretable.
Cross-validation
Cross-validation is a technique to evaluate predictive models by partitioning the original sample into a training set to train the model, and a test set to evaluate it. In k-fold cross-validation, the original sample is randomly partitioned into k equal size subsamples. In case of a 10-fold cross-validation, for instance, the model is estimated on 90% of the data set and tested on the remaining 10%. The operation is repeated 10 times, with a different test set each time.
Continuous elasticities of the Conditional Logit Model
(Menu option: “Choice between multiple alternatives, one line per alternative (0/0/1)”).
Elasticities are computed analytically using the formula where i represents a choice set (or customer), j represents a variable (for example, price), and k represents a choice alternative (for example, a brand). ηijk is the elasticity denoting the % change in choice probability for alternative k in choice set i for a 1% increase in variable j. β̂j is the estimated coefficient corresponding to variable j, Xijk is the value of variable j for alternative k in choice set i and Pik is the estimated choice probability for alternative k in choice set i. Enginius averages the elasticities across the choice sets and reports the average elasticity,
which are in the diagonals of the elasticity matrices, with one matrix for each variable j.
Cross elasticities are computed using the formula, ηijkh = -β̂jXijhPih, where k and h denote choice alternatives k and h. Here, ηijkh is the cross elasticity denoting the % change in the choice probability of alternative k when the value Xijh for variable j for alternative h in choice set i changes by 1%. The interesting thing to note here is that ηijkh is the same for all k when a variable corresponding to another alternative h changes. This is a manifestation of what is referred to as the IIA (Independence of Irrelevant Alternatives) property of the multinomial logit model. Enginius averages the elasticities across the choice sets and reports the average,
which are the off-diagonal elements in the elasticity matrices, with one matrix for each variable j.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article