By Dr Gwinyai Nyakuengama
(28 July 2018)
Customer Churn; RapidMiner Auto Model; Stata; Machine Learning Models; Naive Bayes; Generalized Linear Model (GLM); Logistic Regression; Deep Learning; Random Forest; Gradient Boosted Trees (XGBoost); Model performance; Receiver Operator Curve (ROC); Confusion Matrix; Accuracy; Specificity; Sensitivity.
Previously, we studied customer churn using logistic and survival techniques in Stata, (see Nyakuengama 2018 a, b).
The RapidMiner Auto ML is a state-of-the-art tool with machine learning (ML) capabilities that:
- are easy to use from a pull-down and point-and-click menus;
- allow the user to simultaneously fit several ML models ; and
- allow immediate optimization and ope-rationalization of the best ML models.
In this blog we seek to explore the business merits of the RapidMiner Auto Model for use as a fast and reliable tool-of-choice to predict customer churn.
In this study we:
- launched the RapidMiner Auto Model Studio (version 8.2);
- loaded-up the same customer churn data from our previous blog on logistic regression (see Nyakuengama 2018 b);
- selected churn as the target variable and same explanatory variables namely SEX, SENIORCITIZEN, PARTNERED, DEPENDENT, MULTIPLELINES, CONTRACT, PAPERLESS and TENURE_GROUPS
- ran several machine learning models: Naive Bayes (NB); Generalized Linear Model (GLM); Logistic Regression (LR); Deep Learning (DL); Random Forest (RF) and Gradient Boosted Trees (XGBoost); and
- evaluated how well the ML models predicted customer churn.
MODEL PERFORMANCE PARAMETERS
Terms and definitions in the table below from Saito and Rehmsmeier (2015) are used in assessing machine learning performance.
Model run-time represent the time taken to build, test, validate a model and issue performance parameters.
The above run-times from our RapidMiner Auto Model experiments suggest that the XGBoost model took far longer than the other models. The NB, GLM, LR and DT models had comparable and much shorter run-times. Those for the DL and RF models were slightly longer than for these last four models.
The Receiver Operating Characteristic (ROC) is plot with the X-axis as the false positive rate (FPR) or 1-Specificity and the Y-axis as the true positive rate (TPR) or Sensitivity.
When evaluating between models in machine learning, the model with the largest area under the Receiver Operator Characteristic curve (AUROC) is the preferred one. The AUROC is defined as the probability that a randomly selected positive sample will have a higher prediction value than a randomly selected negative sample (Lan, 2017).
In the above ROC plot and the average AUROC from our current RapidMiner Auto Model experiments we see that NB, GLM, LG, DL, XGBoost models had comparable AUROCs. The DF and RT models performed the worst, judging by this criterion.
The current GLM and LG models had superior accuracy rates (78%). The NB and XGBoost model performances were within two percentage points, but lower. Accuracy rates of the DT and RF model were the lowest (73%).
GLM and LG models had superior classification rates (22%) while DT and RF performed the worst on this measure (27%). The results are the opposite of those seen for model accuracy, above.
The Naive Bayes model and both the DF and RF models had the highest and lowest precision rates, respectively.
Notice recall is the same as sensitivity.
Notice recall was 100% both DF and RF. It is well known that tree-based models tend to over fit.Other than that, the DL and XGBoost models achieved very high recall rates than the rest of the models.
Notice sensitivity is the same as recall and was 100% both DF and RF. It is well known that tree-based models tend to over fit. Other than that, DL and XGBoost models achieved very high recall rates than the rest of the models
Notice again, that Specificity was 0% both DF and RF (the opposite of Sensitivity). Other than that, the NB had superior specificity among the remaining models.
F1-score is the weighted harmonic mean of both the precision and recall, where the best F1-score is 1 and this worst value is 0.
The F measures / scores of GLM and LG (86%) were marginally better than those of the other models (85%).
FEATURE IMPORTANCE (WEIGHTS)
The above results show that the important features for explaining customer churning were as follows; having a short contract, short tenure group or duration, paperless transactions, having dependents and being a senior citizen.
Like in the current blog, previous studies reported similar results for model accuracy, feature importance and other key model performance parameters for Logistic Regressions, using the same customer churn dataset (see Nyakuengama (2018 b) in using Stata, and Li (2017) and Treselle Engineering (2018) both using R programming language). In addition, Li (2017) had employed a number of ML techniques similar to those reported in this blog.
The decision on which ML model best predicts customer churn is very much business-model dependent. We simply note that from the current RapidMiner Auto Model experiments:
- The Naive Bayes model would be preferred over tree based models if precision is of paramount importance in the business
- The Logistic Regression and GLM would be preferred if accuracy and F measure are the key business targets
- The XGBoost model would be overlooked if short run-time is a key business consideration
- While tree-based methods are easiest to understand, they tended to over-fit. A perfect recall rate (100%) sounds a bit too optimistic. Nonetheless, these model are attractive because being non-parametric, they are not sensitive to statistical assumptions (e.g. independence and non-correlation among predictor variables). In deed they can be used in large data-sets where they may outshine other models.
Lastly, we note that any data issues due to class imbalance among predictor variables or internal frequency distributions were not addressed in our study. Different models have strengths and shortcomings which must be fully understood, before committing to any single ML model.
This blog has shown that the RapidMiner Auto ML:
- data analytical tool is easy to learn and use. It has pull-down, point and click menus and well designed process flow windows;
- yields reliable results (e.g. on customer churn, similar to those previously reported on the same data-set using Stata and R programming language);
- quickly yields serious models which can be extensively evaluated, all without any programming.
RapidMiner Auto ML:
- can model specialized market niches or business segments through data clustering; and
- has a Simulator can be used to model “what if” business scenarios (by drilling down and manually fine-tune hyper-parameters to achieve desired outcomes).
H. Lan (2017): Decision Trees and Random Forests for Classification and Regression pt.1 – https://towardsdatascience.com/decision-trees-and-random-forests-for-classification-and-regression-pt-1-dbb65a458df
J.G. Nyakuengama (2018 a): Survival Data Analysis And Visualization In Stata – Part 1 – https://dat-analytics.net/2018/07/22/survival-data-analysis-and-visualization-in-stata-part-1/
J.G. Nyakuengama (2018 b): Predictive Data Analysis And Visualization In Stata – Part 1: Logistic Regression – https://dat-analytics.net/2018/07/25/predictive-data-analysis-and-visualization-in-stata-part-1-logistic-regression/
RapidMiner Studio: https://rapidminer.com/products/studio/
Saito T.; Rehmsmeier M. (2015): The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets – https://doi.org/10.1371/journal.pone.0118432
L. Oldja (2018): Survival Analysis to Explore Customer Churn in Python https://towardsdatascience.com/survival-analysis-in-python-a-model-for-customer-churn-e737c5242822
Treselle Engineering (2018): Customer Churn – Logistic Regression with R http://www.treselle.com/blog/customer-churn-logistic-regression-with-r/
S. Li (2017): Predict Customer Churn with R https://towardsdatascience.com/predict-customer-churn-with-r-9e62357d47b4