Analyze and Prevent Retail Customers churn by Creating a Predictive Model Using Retail Bank by Assets in SEA | Realcode4you

realcode4you
Jun 2, 2024
3 min read

Business Case 1 - Supervised Learning

Context

You are a data analyst working in one of the largest retail bank by assets in SEA. It is the largest payment bank in terms of transaction value.

Challenges faced

Losing existing market share to competitors

Declining year-on-year portfolio balance resulting in low profits across certain segments of retail customers

Objective

You are tasked to analyze and prevent retail customers churn by creating a predictive model to identify customers with a higher propensity to churn.

Assessment Objectives

Perform data preprocessing on the dataset provided
Perform Exploratory Data Analysis (EDA) on the preprocessed dataset
Implement feature selection using suitable statistical techniques
Train, validate and evaluate Supervised Learning models
Implement an optimal Supervised Learning model to address specific business needs

Dataset and Data Dictionary

File Name Description and Comments

bank_churn.csv Customers' personal and bank products information

The data dictionary for the dataset can be found in the 'Data Dictionary – Bank Churn.xlsx' file in the Data folder

Recommended Steps for Model Development

Data Preprocessing

Combine different datasets
Missing value treatment
Outlier treatment
Encoding categorical variables
Balancing data based on target variable (optional)

EDA & Feature Engineering

Exploratory Analysis
Bi-variates
Weight of evidence
Feature Engineering and Selection
Correlation Matrix
VIF
p-values

Model Creation & Validation

Train-test split
Logistic Regression modeling using sklearn and statsmodel
Cross validation folds

Model Testing & Evaluation

Model testing
Evaluate model
Balance data (optional)

Expected output

All your work for Business Case 1 should be done in the ML_Proj_BC1.ipynb file

You should insert additional comments where necessary to explain the purposes of your code
Feel free to insert new blocks of code to achieve the objectives where necessary
Ensure that the entire Jupyter Notebook can be executed without any error
Rename the ML_Proj_BC1.ipynb file to a filename that includes your full name: e.g., ML_Proj_BC1_jack_tan.ipynb

Business Case 2 – Unsupervised Learning

Context

You are a data scientist working in a retail bank based in the Middle East, where they have been doing traditional mass marketing campaigns for years. The bank is now keen to explore the benefits of running tailored marketing campaigns for the customer base.

Challenges faced

Increasingly competitive landscape where other banks are running personalized ad campaigns using differentiated products and services

Profitability pressure from reduced utilization by existing customers.

Objective

In this discovery phase, the objective is to understand the various segments that exist in the bank's customer base, based on the customers' demographics and utilization patterns.

Datasets Available

File Name Description and Comments

Bank_customers.csv Sample data for account status of 1000 customers at a bank

The data dictionary for the dataset can be found in the 'Data Dictionary – Bank Customers.xlsx' file inthe Data folder

Assessment Objectives

Perform standard EDA process in machine learning
Perform customer segmentation using a suitable clustering technique
Use appropriate metrics to measure the performance of the clustering model
Evaluate the clustering model to determine model performance based on context and dataset

Assessment Objectives

Expected Output

All your work for Business Case 2 should be done in the ML_Proj_BC2.ipynb file
You should insert additional comments where necessary to explain the purposes of your code
Feel free to insert new blocks of code to achieve the objectives where necessary
Ensure that the entire Jupyter Notebook can be executed without any error
Rename the ML_Proj_BC2.ipynb file to a filename that includes your full name, e.g., ML_Proj_BC2_jack_tan.ipynb

Business Case 3 – LCNC Machine Learning

Context

Referring back to Business Case 1, the Chief Data Officer (CDO) of the retail bank is dissatisfied with the predictive performance of the classification model in identifying of customer churn.

Knowing that the Data Analytics team has recently adopted the Orange Data Mining platform for LCNCMachine Learning, the CDO challenged the Data Analytics team to build better machine learning models using the platform

Objective

You are tasked to create better predictive models to identify customers with a higher propensity to churn using the Orange Data Mining platform.

Dataset and Data Dictionary

File Name Description and Comments

bank_churn_preprocessed.csv

Customers' personal and bank products information

(cleaned and preprocessed)

The data dictionary for the dataset can be found in the 'Data Dictionary – Bank ChurnPreprocessed.xlsx' file in the Data folder.

Recommended Steps for LCNC ML Model Development

Data Preparation

Load data
Update target column
Train-test split

Model Creation & Finetuning

Train and finetune:

Logistic Regression model (balanced class distribution)
Random Forest model (balanced class distribution)
Gradient Boosting model

Model Testing & Scoring

Cross validation folds (stratified)
Test on training data
Test on testing data

Model Comparison & Evaluation

Confusion Matrix
ROC Analysis

RealCode4You

Analyze and Prevent Retail Customers churn by Creating a Predictive Model Using Retail Bank by Assets in SEA | Realcode4you

Recent Posts

Comments