USE OF RapidMiner – Auto Model TO PREDICT CUSTOMER CHURN

By Dr Gwinyai Nyakuengama

(28 July 2018)

 

KEY WORDS

Customer Churn; RapidMiner Auto Model; Stata; Machine Learning Models; Naive Bayes; Generalized Linear Model (GLM); Logistic Regression; Deep Learning; Random Forest; Gradient Boosted Trees (XGBoost); Model performance; Receiver Operator Curve (ROC);  Confusion Matrix; Accuracy; Specificity; Sensitivity.

 

INTRODUCTION

Previously, we studied customer churn using logistic and survival techniques in Stata, (see Nyakuengama 2018 a, b).

The RapidMiner Auto ML is a state-of-the-art tool with machine learning (ML) capabilities that:

  • are easy to use from a pull-down and point-and-click menus;
  • allow the user to simultaneously fit several ML models ; and
  • allow immediate optimization and ope-rationalization of the best ML models.

In this blog we seek to explore the business merits of the RapidMiner Auto Model for use as a fast and reliable tool-of-choice to predict customer churn.

 

METHOD

In this study we:

  • launched the RapidMiner Auto Model Studio (version 8.2);
  • loaded-up the same customer churn data from our previous blog on logistic regression (see Nyakuengama 2018 b);
  • selected churn as the target variable and same explanatory variables namely SEX, SENIORCITIZEN, PARTNERED, DEPENDENT, MULTIPLELINES, CONTRACT, PAPERLESS and TENURE_GROUPS
  • ran several machine learning models: Naive Bayes (NB); Generalized Linear Model (GLM);  Logistic Regression (LR); Deep Learning (DL); Random Forest (RF) and Gradient Boosted Trees (XGBoost); and
  • evaluated how well the ML models predicted customer churn.

 

MODEL PERFORMANCE PARAMETERS

 

Terms and definitions in the table below from Saito and Rehmsmeier (2015) are used in assessing machine learning performance.

matric.png 

RESULTS

 

RUN-TIMES (ms)

Model run-time represent the time taken to build, test, validate a model and issue performance parameters.

RapidMiner_6sruntime.png

INTERPRETATION

The  above run-times from our RapidMiner Auto Model experiments suggest that the XGBoost model took far longer than the other models. The NB, GLM, LR and DT models had comparable and much shorter run-times. Those for the DL and RF models were slightly longer than for these last four models.

 

ROC COMPARISONS

The Receiver Operating Characteristic (ROC) is plot with the X-axis as the false positive rate (FPR) or 1-Specificity and the Y-axis as the true positive rate (TPR) or Sensitivity

When evaluating between models in machine learning, the model with the largest area under the Receiver Operator Characteristic curve (AUROC) is the preferred one. The AUROC is defined as the probability that a randomly selected positive sample will have a higher prediction value than a randomly selected negative sample (Lan, 2017). 

RapidMiner_ROC_Compared

AUROC

RapidMiner_99AUC

INTERPRETATION

 In the above ROC plot  and the average AUROC from our current RapidMiner Auto Model experiments we see that NB, GLM, LG, DL, XGBoost models had comparable AUROCs. The DF and RT models performed the worst, judging by this criterion.  

 

ACCURACY

RapidMiner_7accuracy.png

INTERPRETATION

The current GLM and LG models had superior accuracy rates (78%). The NB and XGBoost model performances were within two percentage points, but lower. Accuracy rates of the DT and RF model were the lowest (73%).

 

CLASSIFICATION ERROR

RapidMiner_8classification error.png

INTERPRETATION

GLM and LG models had superior classification rates (22%) while DT and RF performed the worst on this measure (27%). The results are the opposite of those seen for model accuracy, above.

 

PRECISION

RapidMiner_9Precision

INTERPRETATION

The Naive Bayes model and both the DF and RF models had the highest and lowest precision rates, respectively.

 

 

RECALL

RapidMiner_10Recall

INTERPRETATION

Notice recall is the same as sensitivity.

Notice recall was 100% both DF and RF. It is well known that tree-based models tend to over fit.Other than that, the DL  and XGBoost models achieved very high recall rates than the rest of the models.

 

SENSITIVITY

RapidMiner_11Sensitivity

INTERPRETATION

Notice sensitivity is the same as recall and was 100% both DF and RF. It is well known that tree-based models tend to over fit. Other than that, DL and XGBoost models achieved very high recall rates than the rest of the models

 

SPECIFICITY

RapidMiner_12Specificity

INTERPRETATION

Notice again, that Specificity was 0% both DF and RF (the opposite of Sensitivity). Other than that, the NB had superior specificity among the remaining models.

 

F MEASURE

F1-score is the weighted harmonic mean of both the precision and recall, where the best F1-score is 1 and this worst value is 0.    

f measure

INTERPRETATION

The F measures / scores of GLM and LG (86%) were marginally better than those of the other models (85%).

 

FEATURE IMPORTANCE (WEIGHTS)

RapidMiner_FeatureImportance

INTERPRETATION

The above results show that the important features for explaining customer churning were as follows; having a short contract, short tenure group or duration, paperless transactions,  having dependents and being a senior citizen.

 

DISCUSSION 

Like in the current blog, previous studies reported similar results for model accuracy, feature importance and other key model performance parameters for Logistic Regressions, using the same customer churn dataset (see Nyakuengama (2018 b) in using Stata, and Li (2017) and Treselle Engineering (2018) both using R programming language). In addition, Li (2017) had employed a number of ML techniques similar to those reported in this blog.

The decision on which ML model best predicts customer churn is very much business-model dependent. We simply note that from the current RapidMiner Auto Model experiments:

  • The Naive Bayes model would be preferred over tree based models if precision is of paramount importance in the business
  • The Logistic Regression and GLM would be preferred if accuracy and F measure are the key business targets
  • The XGBoost model would be overlooked if short run-time is a key business consideration
  • While tree-based methods are easiest to understand, they tended to over-fit. A perfect recall rate (100%) sounds a bit too optimistic. Nonetheless, these model are attractive because being non-parametric, they are not sensitive to statistical assumptions (e.g. independence and non-correlation among predictor variables). In deed they can be used in large data-sets where they may outshine other models.

Lastly, we note that any data issues due to class imbalance among predictor variables or internal frequency distributions were not addressed in our study. Different models have strengths and shortcomings  which must be fully understood, before committing to any single ML model.

 

CONCLUSIONS 

This blog has shown that the RapidMiner Auto ML:

  • data analytical tool is easy to learn and use. It has pull-downpoint and click menus and well designed process flow windows;
  • yields reliable results (e.g. on customer churn, similar to those previously reported on the same data-set using Stata and R programming language);
  • quickly yields serious models which can be extensively evaluated, all without any programming.

 

RapidMiner Auto ML:

  • can model specialized market niches or business segments through data clustering; and
  • has a Simulator can be used to model “what if” business scenarios (by drilling down and manually fine-tune hyper-parameters to achieve desired outcomes). 

 

BIBLIOGRAPHY

H. Lan (2017): Decision Trees and Random Forests for Classification and Regression pt.1 – https://towardsdatascience.com/decision-trees-and-random-forests-for-classification-and-regression-pt-1-dbb65a458df

J.G. Nyakuengama (2018 a): Survival Data Analysis And Visualization In Stata – Part 1 – https://dat-analytics.net/2018/07/22/survival-data-analysis-and-visualization-in-stata-part-1/

J.G. Nyakuengama (2018 b): Predictive Data Analysis And Visualization In Stata – Part 1: Logistic Regression – https://dat-analytics.net/2018/07/25/predictive-data-analysis-and-visualization-in-stata-part-1-logistic-regression/

RapidMiner Studio: https://rapidminer.com/products/studio/

Saito T.; Rehmsmeier M. (2015): The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets – https://doi.org/10.1371/journal.pone.0118432

L. Oldja (2018): Survival Analysis to Explore Customer Churn in Python https://towardsdatascience.com/survival-analysis-in-python-a-model-for-customer-churn-e737c5242822

Treselle Engineering (2018): Customer Churn – Logistic Regression with R http://www.treselle.com/blog/customer-churn-logistic-regression-with-r/

S. Li (2017): Predict Customer Churn with R https://towardsdatascience.com/predict-customer-churn-with-r-9e62357d47b4

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.