SQL項目實戰(zhàn):銀行客戶分析
在本文中,將與大家分享一個SQL項目,即根據(jù)從數(shù)據(jù)集收集到的信息分析銀行客戶流失的可能性。這些洞察來自個人信息,如年齡、性別、收入和人口統(tǒng)計信息、銀行卡類型、產(chǎn)品、客戶信用評分以及客戶在銀行的服務時間長短等。對于銀行而言,了解如何留住客戶比尋找其他客戶更有利。
客戶流失是指客戶或顧客的流失。公司通常將其作為關(guān)鍵業(yè)務指標之一,因為恢復的長期客戶對公司的價值遠遠高于新招募的客戶。
客戶流失有兩種類型:自愿流失和非自愿流失。自愿流失是由于客戶決定轉(zhuǎn)向其他公司或服務提供商,而非自愿流失則是由于客戶搬遷到長期護理機構(gòu)、死亡或搬遷到較遠的地方等情況造成的。
在本項目中,本文將集中討論自愿流失,因為它可能是由于公司與客戶關(guān)系中公司可以控制的因素造成的,例如如何處理賬單互動或如何提供售后幫助。
【來源】:
https://en.wikipedia.org/wiki/Customer_attrition
數(shù)據(jù)集解釋
【網(wǎng)址】:
https://www.kaggle.com/datasets/radheshyamkollipara/bank-customer-churn
圖片
本文使用的是customer_churn_records
表,該表包含多列,customerid
是表的主鍵。
- RowNumber:對應記錄(行)編號
- CustomerId:客戶的ID編號
- Surname:客戶的姓氏
- CreditScore:客戶信用行為預測值
- Geography:客戶所在地
- Gender:客戶的性別信息
- Age:客戶的年齡信息
- Tenure:客戶在銀行的使用年限
- Balance:客戶賬戶中的余額信息
- NumOfProducts:客戶購買的產(chǎn)品數(shù)量
- HasCrCard:客戶是否擁有信用卡
- IsActiveMember:客戶是否處于活躍狀態(tài)
- EstimatedSalary(估計工資):客戶的估計工資金額
- Exited:客戶是否離開銀行
- Complain:客戶是否有投訴
- Satisfaction Score:客戶對銀行的滿意度評分
- Card Type:客戶持有的銀行卡類型
- Points Earned:客戶使用信用卡獲得的積分
# 顯示表中的列 = customer_churn_records
q='''
SELECT * FROM customer_churn_records
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
查詢客戶流失率
導入軟件包
import psycopg2 # PostgreSQL數(shù)據(jù)庫適配器
import pandas as pd # 用于分析數(shù)據(jù)
from sqlalchemy import create_engine # 促進Python程序與數(shù)據(jù)庫之間的通信
首先,本文根據(jù)已退出的列計算有多少客戶流失。
# 統(tǒng)計是否流失/退出的客戶總數(shù)
q='''
WITH temp_churn AS(
SELECT exited,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
COUNT(exited) as Total
FROM temp_churn
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
在10000名客戶中,有近20%從銀行退出或流失。盡管這個數(shù)字并不算很大,但如果現(xiàn)在還不能解決這個問題,它可能會增長得更多。
現(xiàn)在,本文將從活躍客戶、性別、人口統(tǒng)計、年齡、臨時工齡、信用分數(shù)、產(chǎn)品數(shù)量、滿意度分數(shù)、投訴、是否有信用卡、卡類型、已獲積分、預估薪資和余額等多個方面來檢查客戶流失狀況的類型。
# 統(tǒng)計有多少活躍客戶流失
q='''
WITH temp_isactivemember AS(
SELECT exited,
CASE
WHEN isactivemember = 1 THEN 'Active'
ELSE 'Not Active'
END AS isactivemember
from customer_churn_records
)
SELECT isactivemember,
COUNT (CASE WHEN exited = 1 THEN 1 END) AS Churn,
COUNT (CASE WHEN exited = 0 THEN 1 END) AS Not_Churn
FROM temp_isactivemember
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df
圖片
# 根據(jù)性別計算是否流失/退出的客戶總數(shù)
q='''
SELECT gender,
COUNT(gender) as Total,
COUNT(case when exited = 1 then 1 end) as Churn,
COUNT(case when exited = 0 then 1 end) as Not_churn
FROM customer_churn_records
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)人口統(tǒng)計數(shù)據(jù)計算流失客戶的數(shù)量
q='''
SELECT geography,
COUNT (CASE WHEN exited = 1 THEN 1 END) AS Churn,
COUNT (CASE WHEN exited = 0 THEN 1 END) AS Not_Churn
FROM customer_churn_records
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df
圖片
#根據(jù)年齡組計算流失客戶的數(shù)量
q='''
SELECT
CASE
WHEN age <= 20 THEN 'Group <= 20'
WHEN age >= 21 AND age <= 40 THEN 'Group 21-40'
WHEN age >= 41 AND age <= 60 THEN 'Group 41-60'
ELSE 'Group > 60'
END AS age_category,
COUNT(CASE WHEN exited = 1 then 1 end) as Churn,
COUNT(CASE WHEN exited = 0 then 1 end) as Not_Churn
FROM customer_churn_records
GROUP BY 1
ORDER BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df
圖片
# 根據(jù)他們成為客戶的時間計算是否流失/退出的客戶總數(shù)
q='''
WITH temp_tenure AS(
SELECT tenure,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(tenure) as Average_tenure
FROM temp_tenure
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)信用評分計算有多少客戶流失
#(平均值、最高值和最低值)
q='''
WITH temp_creditscore AS(
SELECT creditscore,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(creditscore) as avg_creditscore,
MAX(creditscore) as max_creditscore,
MIN(creditscore) as min_creditscore
FROM temp_creditscore
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶使用銀行產(chǎn)品的數(shù)量,計算有多少客戶流失
q='''
WITH temp_bankprod AS(
SELECT numofproducts,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(numofproducts) as avg_numofproducts
FROM temp_bankprod
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶 對銀行的滿意度得分的平均得分,計算有多少客戶流失
q='''
WITH temp_satisfaction AS(
SELECT satisfaction_score,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(satisfaction_score) as satisfaction_level
FROM temp_satisfaction
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶的投訴量,統(tǒng)計有多少客戶流失
q='''
WITH temp_complain AS(
SELECT complain,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
COUNT(complain) as complain
FROM temp_complain
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶是否擁有信用卡計算有多少客戶流失
q='''
WITH temp_hascrcard AS(
SELECT hascrcard,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
COUNT (CASE WHEN hascrcard = 1 THEN 1 END) AS has_creditcard,
COUNT (CASE WHEN hascrcard = 0 THEN 1 END) AS no_creditcard
FROM temp_hascrcard
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df
圖片
# 根據(jù)卡的類型計算流失客戶的數(shù)量
q='''
WITH temp_card AS(
SELECT card_type,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
COUNT(CASE WHEN card_type = 'PLATINUM' THEN 1 END) as PLATINUM,
COUNT(CASE WHEN card_type = 'DIAMOND' THEN 1 END) as DIAMOND,
COUNT(CASE WHEN card_type = 'GOLD' THEN 1 END) as GOLD,
COUNT(CASE WHEN card_type = 'SILVER' THEN 1 END) as SILVER
FROM temp_card
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶獲得的積分計算有多少客戶流失
q='''
WITH temp_point AS(
SELECT point_earned,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(point_earned) as avg_point,
MAX(point_earned) as max_point,
MIN(point_earned) as min_point
FROM temp_point
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶的預估工資計算有多少客戶流失
q='''
WITH temp_salary AS(
SELECT estimatedsalary,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(estimatedsalary) as avg_salary,
MAX(estimatedsalary) as max_salary,
MIN(estimatedsalary) as min_salary
FROM temp_salary
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶的銀行存款余額計算有多少客戶流失
#(平均、最高、最低)
q='''
WITH temp_balance AS(
SELECT balance,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(balance) as avg_balance,
MAX(balance) as max_balance,
MIN(balance) as min_balance
FROM temp_balance
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
結(jié)論
根據(jù)上述問題,有一些類別可以幫助確定哪些方面會真正影響客戶流失。不管客戶在銀行停留了多長時間,他們?nèi)匀挥锌赡芰魇?,或者說,客戶的銀行賬戶上有相當數(shù)量的存款,他們?nèi)匀挥锌赡芰魇А?/p>
41至60歲年齡段的客戶比其他年齡段的客戶更容易流失。為了解決這個問題,銀行可以集中精力創(chuàng)造或提升產(chǎn)品和服務,以幫助吸引和維護特定年齡段的客戶,比如為年齡較大的客戶提供更流暢的服務和最短的排隊時間。
持有信用卡的客戶往往不會流失,而是會繼續(xù)留在銀行。銀行最好通過各種促銷活動說服更多的客戶申請信用卡,這取決于客戶細分,可根據(jù)客戶的卡種(鉆石卡、白金卡、金卡、銀卡)、性別、年齡、支出和人口分布進行細分。
留存客戶和流失客戶的滿意度得分有點令人擔憂 [ 3.017960 / 2.997547 ]。銀行需要進行評估,以保持流失客戶和留存客戶之間的滿意度得分差距,并保持活躍客戶,因為活躍客戶流失的可能性較低。






