By Dr Gwinyai Nyakuengama

(28 July 2018)

KEY WORDS

Customer Churn; RapidMiner Auto Model; Stata; Machine Learning Models; Naive Bayes; Generalized Linear Model (GLM); Logistic Regression; Deep Learning; Random Forest; Gradient Boosted Trees (XGBoost); Model performance; Receiver Operator Curve (ROC); Confusion Matrix; Accuracy; Specificity; Sensitivity.

INTRODUCTION

Previously, we studied customer churn using logistic and survival techniques in Stata, (see Nyakuengama 2018 a, b).

The RapidMiner Auto ML is a state-of-the-art tool with machine learning (ML) capabilities that:

are easy to use from a pull-down and point-and-click menus;
allow the user to simultaneously fit several ML models ; and
allow immediate optimization and ope-rationalization of the best ML models.

In this blog we seek to explore the business merits of the RapidMiner Auto Model for use as a fast and reliable tool-of-choice to predict customer churn.

METHOD

In this study we:

launched the RapidMiner Auto Model Studio (version 8.2);
loaded-up the same customer churn data from our previous blog on logistic regression (see Nyakuengama 2018 b);
selected churn as the target variable and same explanatory variables namely SEX, SENIORCITIZEN, PARTNERED, DEPENDENT, MULTIPLELINES, CONTRACT, PAPERLESS and TENURE_GROUPS
ran several machine learning models: Naive Bayes (NB); Generalized Linear Model (GLM); Logistic Regression (LR); Deep Learning (DL); Random Forest (RF) and Gradient Boosted Trees (XGBoost); and
evaluated how well the ML models predicted customer churn.

MODEL PERFORMANCE PARAMETERS

Terms and definitions in the table below from Saito and Rehmsmeier (2015) are used in assessing machine learning performance.

RESULTS

RUN-TIMES (ms)

Model run-time represent the time taken to build, test, validate a model and issue performance parameters.

INTERPRETATION

The above run-times from our RapidMiner Auto Model experiments suggest that the XGBoost model took far longer than the other models. The NB, GLM, LR and DT models had comparable and much shorter run-times. Those for the DL and RF models were slightly longer than for these last four models.

ROC COMPARISONS

The Receiver Operating Characteristic (ROC) is plot with the X-axis as the false positive rate (FPR) or 1-Specificity and the Y-axis as the true positive rate (TPR) or Sensitivity.

When evaluating between models in machine learning, the model with the largest area under the Receiver Operator Characteristic curve (AUROC) is the preferred one. The AUROC is defined as the probability that a randomly selected positive sample will have a higher prediction value than a randomly selected negative sample (Lan, 2017).

RapidMiner_ROC_Compared

AUROC

RapidMiner_99AUC

INTERPRETATION

In the above ROC plot and the average AUROC from our current RapidMiner Auto Model experiments we see that NB, GLM, LG, DL, XGBoost models had comparable AUROCs. The DF and RT models performed the worst, judging by this criterion.

ACCURACY

INTERPRETATION

The current GLM and LG models had superior accuracy rates (78%). The NB and XGBoost model performances were within two percentage points, but lower. Accuracy rates of the DT and RF model were the lowest (73%).

CLASSIFICATION ERROR

RapidMiner_8classification error.png

INTERPRETATION

GLM and LG models had superior classification rates (22%) while DT and RF performed the worst on this measure (27%). The results are the opposite of those seen for model accuracy, above.

PRECISION

RapidMiner_9Precision

INTERPRETATION

The Naive Bayes model and both the DF and RF models had the highest and lowest precision rates, respectively.

RECALL

RapidMiner_10Recall

INTERPRETATION

Notice recall is the same as sensitivity.

Notice recall was 100% both DF and RF. It is well known that tree-based models tend to over fit.Other than that, the DL and XGBoost models achieved very high recall rates than the rest of the models.

SENSITIVITY

RapidMiner_11Sensitivity

INTERPRETATION

Notice sensitivity is the same as recall and was 100% both DF and RF. It is well known that tree-based models tend to over fit. Other than that, DL and XGBoost models achieved very high recall rates than the rest of the models

SPECIFICITY

RapidMiner_12Specificity

INTERPRETATION

Notice again, that Specificity was 0% both DF and RF (the opposite of Sensitivity). Other than that, the NB had superior specificity among the remaining models.

F MEASURE

F1-score is the weighted harmonic mean of both the precision and recall, where the best F1-score is 1 and this worst value is 0.

f measure

INTERPRETATION

The F measures / scores of GLM and LG (86%) were marginally better than those of the other models (85%).

FEATURE IMPORTANCE (WEIGHTS)

RapidMiner_FeatureImportance

INTERPRETATION

The above results show that the important features for explaining customer churning were as follows; having a short contract, short tenure group or duration, paperless transactions, having dependents and being a senior citizen.

DISCUSSION

Like in the current blog, previous studies reported similar results for model accuracy, feature importance and other key model performance parameters for Logistic Regressions, using the same customer churn dataset (see Nyakuengama (2018 b) in using Stata, and Li (2017) and Treselle Engineering (2018) both using R programming language). In addition, Li (2017) had employed a number of ML techniques similar to those reported in this blog.

The decision on which ML model best predicts customer churn is very much business-model dependent. We simply note that from the current RapidMiner Auto Model experiments:

The Naive Bayes model would be preferred over tree based models if precision is of paramount importance in the business
The Logistic Regression and GLM would be preferred if accuracy and F measure are the key business targets
The XGBoost model would be overlooked if short run-time is a key business consideration
While tree-based methods are easiest to understand, they tended to over-fit. A perfect recall rate (100%) sounds a bit too optimistic. Nonetheless, these model are attractive because being non-parametric, they are not sensitive to statistical assumptions (e.g. independence and non-correlation among predictor variables). In deed they can be used in large data-sets where they may outshine other models.

Lastly, we note that any data issues due to class imbalance among predictor variables or internal frequency distributions were not addressed in our study. Different models have strengths and shortcomings which must be fully understood, before committing to any single ML model.

CONCLUSIONS

This blog has shown that the RapidMiner Auto ML:

data analytical tool is easy to learn and use. It has pull-down, point and click menus and well designed process flow windows;
yields reliable results (e.g. on customer churn, similar to those previously reported on the same data-set using Stata and R programming language);
quickly yields serious models which can be extensively evaluated, all without any programming.

RapidMiner Auto ML:

can model specialized market niches or business segments through data clustering; and
has a Simulator can be used to model “what if” business scenarios (by drilling down and manually fine-tune hyper-parameters to achieve desired outcomes).

BIBLIOGRAPHY

H. Lan (2017): Decision Trees and Random Forests for Classification and Regression pt.1 – https://towardsdatascience.com/decision-trees-and-random-forests-for-classification-and-regression-pt-1-dbb65a458df

J.G. Nyakuengama (2018 a): Survival Data Analysis And Visualization In Stata – Part 1 – https://dat-analytics.net/2018/07/22/survival-data-analysis-and-visualization-in-stata-part-1/

J.G. Nyakuengama (2018 b): Predictive Data Analysis And Visualization In Stata – Part 1: Logistic Regression – https://dat-analytics.net/2018/07/25/predictive-data-analysis-and-visualization-in-stata-part-1-logistic-regression/

RapidMiner Studio: https://rapidminer.com/products/studio/

Saito T.; Rehmsmeier M. (2015): The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets – https://doi.org/10.1371/journal.pone.0118432

L. Oldja (2018): Survival Analysis to Explore Customer Churn in Python https://towardsdatascience.com/survival-analysis-in-python-a-model-for-customer-churn-e737c5242822

Treselle Engineering (2018): Customer Churn – Logistic Regression with R http://www.treselle.com/blog/customer-churn-logistic-regression-with-r/

S. Li (2017): Predict Customer Churn with R https://towardsdatascience.com/predict-customer-churn-with-r-9e62357d47b4

USE OF RapidMiner – Auto Model TO PREDICT CUSTOMER CHURN

By Dr Gwinyai Nyakuengama

(28 July 2018)

KEY WORDS

INTRODUCTION

METHOD

MODEL PERFORMANCE PARAMETERS

RESULTS

RUN-TIMES (ms)

ROC COMPARISONS

AUROC

ACCURACY

CLASSIFICATION ERROR

PRECISION

RECALL

SENSITIVITY

SPECIFICITY

F MEASURE

FEATURE IMPORTANCE (WEIGHTS)

DISCUSSION

CONCLUSIONS

BIBLIOGRAPHY

Like this:

Published by predictivedatanalytics

Leave a ReplyCancel reply

By Dr Gwinyai Nyakuengama

(28 July 2018)

KEY WORDS

INTRODUCTION

METHOD

MODEL PERFORMANCE PARAMETERS

RESULTS

RUN-TIMES (ms)

ROC COMPARISONS

AUROC

ACCURACY

CLASSIFICATION ERROR

PRECISION

RECALL

SENSITIVITY

SPECIFICITY

F MEASURE

FEATURE IMPORTANCE (WEIGHTS)

DISCUSSION

CONCLUSIONS

BIBLIOGRAPHY

Share this:

Like this:

Published by predictivedatanalytics

Leave a ReplyCancel reply

Discover more from DatAnalytics