Credit Product Revobank

Banking analytics project that leveraged machine learning to segment 12,559 customer records into distinct behavioral personas.

Client:

Revobank

Date:

June 27, 2025

Type:

Customer Segmentation

Role:

Data Analyst

About

This project was conducted as part of the RevoU Python module and aimed to sharpen my technical expertise in data cleansing, exploratory data analysis, and segmentation using clustering methods.

Key Deliverables :

the project’s primary focus was on customer segmentation using RFM and K-means methodologies. My role involved understanding business problem, extracting insights, categorizing customers based on distinct characteristics, overseeing behavior tendency, and customizing recommendation for each segments based on transaction trends.

Context

RevoBank is a European bank that offers credit card products to its customers. The bank aims to increase credit card usage among existing clients by analyzing customer behavior and sales performance.

My role in this part are part of the Performance Management (PM) team, that working with a dataset from the MIS team that contains 36 months of sales data. The goal is to create user personas and insights based on client activity to support credit card usage strategies.

Problem

  • Transaction Behavior Analysis: RevoBank wanted to understand how customers transacted during promotional vs. non-promotional periods.

  • Promotion Effectiveness: There was a need to compare credit card usage across different customer groups and validate the effectiveness of past promotions.

  • Targeted Strategy Development: The bank aimed to design cost-efficient, segmented promotional strategies that resonated with core user clusters

Objective

  • To analyze RevoBank credit card sales performance trends over the past three years and identify key growth patterns.

  • To develop user personas based on existing client data to understand customer behavior and segment characteristics.

  • To identify and prioritize business opportunities that will increase RevoBank credit card product usage among current customers.

Processes & Considerations

Step 1

Data Cleaning

  • Out of 12.558 rows of data, 72 of identical duplicate data are found and manage to deleted where it lead into a usage of 12.486 data or 99.2% from the original data are used.

  • Identified anomalies data that referred to the data dictionary caused :

Before

After

Process

4 Account Activity Level

3 Account Activity Level

Deletion because none information are support to impute or insert the data to others

6 Customer Value Level

5 Customer Value Level

Insert into other value level referring to the lower limit of a level

  • Found out 735 missing values of data from the avg_sales_L36M that handled with imputing the data using mean imputation method (5.89% missing values doesn’t pass the low threshold of deletion)

In order to ensure data quality and reliability for the RevoBank credit card analysis, comprehensive data cleaning and preparation processes were implemented on the original dataset of 12,558 rows.

Step 2

Exploratory Data Analysis - Customer Segmentation

  • Setting the environment by initialize Key Credit Card Usage Metrics and the Potential Profit Calculation through this features :

Key Credit Card Usage Metrics

Potential Profit Calculation

avg_sales_L36M

avg_sales_L36M

cnt_sales_L36M

cnt_sales_L36M

month_since_last_sales


count_direct_promo_L12M


  • Examine the distribution of credit card usage over 36 months to know how’s the distribution of the usage for the credit card by immediately visualize the distribution

plt.figure(figsize=(12, 8))
sns.histplot(df['cnt_sales_L36M'], kde=True)
plt.title('Distribution of Credit Card Product Usage Frequency (Past 36 Months)')
plt.xlabel('Number of Transactions')
plt.ylabel('Count of Customers')
plt.axvline(df['cnt_sales_L36M'].mean(), color='red', linestyle='--', label=f'Mean: {df["cnt_sales_L36M"].mean():.2f}')
plt.legend()
plt.show()
Screenshot 2025-06-30 at 17.02.28.png
  • Identified that there are a lot of outliers in the data I pick Robust Scaler to scale the data by the features that already been selected that related to the Credit Card Product Usage because robust scaler uses medians and quantiles instead of means and standard deviations and that makes it more robust to the outliers

# Select relevant features that indicate credit card product usage patterns
features = [
    'avg_sales_L36M',        # Average sales amount (monetization of products)
    'cnt_sales_L36M',        # Frequency of product usage
    'month_since_last_sales', # Recency of product usage
    'count_direct_promo_L12M' # Response to promotions
]

# Check if additional credit-related features exist in the dataset
credit_features = [col for col in df.columns if 'credit' in col.lower()]
if credit_features:
    print(f"Additional credit-related features found: {credit_features}")
    features.extend(credit_features)

X = df[features].copy()

# Handle missing values if any
X = X.fillna(X.median())  # Using median instead of mean due to outliers

# Apply RobustScaler to handle outliers
# RobustScaler uses medians and quantiles instead of means and standard deviations
# This makes it more robust to outliers in the data
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X)

# Convert to DataFrame for better understanding
X_scaled_df = pd.DataFrame(X_scaled, columns=features)
print("\\nRobustScaler-Transformed Data Preview:")
display(X_scaled_df.head())

# Compare original vs. scaled distributions
plt.figure(figsize=(15, 10))
for i, feature in enumerate(features):
    plt.subplot(2, len(features), i+1)
    sns.histplot(X[feature], kde=True)
    plt.title(f'Original: {feature}')

    plt.subplot(2, len(features), i+1+len(features))
    sns.histplot(X_scaled_df[feature], kde=True)
    plt.title(f'RobustScaled: {feature}')

plt.tight_layout()
plt.show()
Screenshot 2025-06-30 at 17.11.22.pngScreenshot 2025-06-30 at 17.11.36.png
  • Implement K-Means Clustering Method (choosing K-Means Method rather than RFM because this is sales dataset where it could lead to an anomalies spread of data, and in this I want to identify by using a lot of features and K-Means can create as many segments as needed) for the credit card usage analysis

  • Confirm the K or Cluster number using the Elbow Method and Silhouette Method to support the usage of cluster number, Reasoning :

    Why Elbow Method : To identify the optimal number of clusters by evaluating the Within-Cluster Sum of Squares (WCSS), the measure of how tightly grouped the data points are within each cluster.

    # Elbow Method with RobustScaler-transformed data
    distortions = []
    K_range = range(1, 10)
    for k in K_range:
        kmeanModel = KMeans(n_clusters=k, random_state=42, n_init=10)
        kmeanModel.fit(X_scaled)
        distortions.append(sum(np.min(cdist(X_scaled, kmeanModel.cluster_centers_, 'euclidean'), axis=1)) / X_scaled.shape[0])
    
    # Plot the Elbow Curve
    plt.figure(figsize=(10, 6))
    plt.plot(K_range, distortions, 'bx-')
    plt.xlabel('Number of clusters (k)')
    plt.ylabel('Distortion (Average within-cluster distance)')
    plt.title('Elbow Method For Optimal k - RevoBank Credit Card Usage Segments')
    plt.grid(True)
    plt.show()


    Screenshot 2025-06-30 at 17.19.58.png

    Why Silhouette Method : To evaluate how well-separated and well-defined the clusters are not just compact, but also distinct from one another.

    # Silhouette Analysis for additional validation
    silhouette_scores = []
    for k in range(2, 10):  # Silhouette score requires at least 2 clusters
        kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
        cluster_labels = kmeans.fit_predict(X_scaled)
        silhouette_avg = silhouette_score(X_scaled, cluster_labels)
        silhouette_scores.append(silhouette_avg)
        print(f"For n_clusters = {k}, the silhouette score is {silhouette_avg}")
    
    # Plot Silhouette Scores
    plt.figure(figsize=(10, 6))
    plt.plot(range(2, 10), silhouette_scores, 'bo-')
    plt.xlabel('Number of clusters (k)')
    plt.ylabel('Silhouette Score')
    plt.title('Silhouette Analysis For RevoBank Customer Segments')
    plt.grid(True)
    plt.show()


    Screenshot 2025-06-30 at 17.21.04.png
  • From the Elbow and Silhouette Method we can know that K=4 or using 4 Cluster are the best for the segmentation where 4 cluster have a strong in several aspect just like :

    1. Elbow Point: It's precisely where the marginal benefit of additional clusters drops dramatically

    2. Balanced Trade-off: While not the absolute highest silhouette score, it provides reasonable cluster quality without over-segmentation

    3. Business Practicality: Four segments are manageable for marketing strategies and customer relationship management

    4. Natural Structure: The data appears to have inherent groupings that align well with four clusters

  • By choosing K=4 value as the cluster, we can identify the valuable credit card user segments by grouping the key metrics with each cluster

# Group by cluster and calculate key credit card usage metrics
cluster_analysis = df.groupby('cluster').agg({
    'avg_sales_L36M': 'mean',
    'cnt_sales_L36M': 'mean',
    'account_id': 'count',
    'month_since_last_sales': 'mean',
    'count_direct_promo_L12M': 'mean',
    'potential_profit': 'mean'
}).rename(columns={'account_id': 'customer_count'})

cluster_analysis
Screenshot 2025-06-30 at 21.09.42.png

Step 3

Data Visualization - By Key Metrics

  • By identifying the key metrics of Credit Card Usage there are several points that we need to analyze based on each cluster :

    1. Average Sales Per Client

    (Understanding revenue potential and customer value)

    # Visualize key metrics by cluster
    plt.figure(figsize=(16, 12))
    
    # Average sales per client
    plt.subplot(2, 2, 1)
    sns.barplot(x=cluster_analysis.index, y=cluster_analysis['avg_sales_L36M'])
    plt.title('Average Credit Card Product Sales by Segment')
    plt.xlabel('Customer Segment')
    plt.ylabel('Average Sales (€)')


    Screenshot 2025-06-30 at 21.26.11.png
    1. Average Transaction Frequency

    (Understanding customer engagement patterns and loyalty)

    # Average transaction frequency
    plt.subplot(2, 2, 2)
    sns.barplot(x=cluster_analysis.index, y=cluster_analysis['cnt_sales_L36M'])
    plt.title('Credit Card Product Usage Frequency by Segment')
    plt.xlabel('Customer Segment')
    plt.ylabel('Avg Number of Transactions')


    Screenshot 2025-06-30 at 21.27.01.png
    1. Total Profit by Cluster Metrics

    (Understanding actual business value and ROI)

    # Total profit by cluster (using 2.4% margin)
    plt.subplot(2, 2, 3)
    sns.barplot(x=cluster_analysis.index, y=cluster_analysis['total_profit'])
    plt.title('Total Profit by Customer Segment (2.4% Margin)')
    plt.xlabel('Customer Segment')
    plt.ylabel('Total Profit (€)')


    Screenshot 2025-06-30 at 21.27.41.png
    1. Active and Inactive Proportion Metrics

    (Understanding segment health and engagement levels)

    # Active vs inactive proportion
    plt.subplot(2, 2, 4)
    sns.barplot(x=cluster_analysis.index, y=cluster_analysis['active_proportion'])
    plt.title('Proportion of Active Credit Card Product Users')
    plt.xlabel('Customer Segment')
    plt.ylabel('Active Proportion')
    
    plt.tight_layout()
    plt.show()


    Screenshot 2025-06-30 at 21.29.09.png

Step 4

Identify Business Opportunities

  • With all the output we can identify the user persona for each segments

  • Segment 0 :

    Screenshot 2025-06-30 at 21.56.04.png
  • Segment 1 :

    Screenshot 2025-06-30 at 21.56.46.png
  • Segment 2 :

    Screenshot 2025-06-30 at 21.57.00.png
  • Segment 3 :

    Screenshot 2025-06-30 at 21.57.13.png
# Add demographic and behavior analysis
extended_cluster_analysis = df.groupby('cluster').agg({
    'avg_sales_L36M': 'mean',
    'cnt_sales_L36M': 'mean',
    'month_since_last_sales': 'mean',
    'flag_female': 'mean',  # Proportion of females
    'MOB': 'mean',  # Months on book
    'potential_profit': 'mean',
    'customer_value_level': lambda x: x.mode()[0] if not x.mode().empty else 'Unknown',
    'account_activity_level': lambda x: x.mode()[0] if not x.mode().empty else 'Unknown',
    'count_direct_promo_L12M': 'mean',
    'birth_date': lambda x: pd.to_datetime(x, errors='coerce').mean() if pd.to_datetime(x, errors='coerce').notna().any() else None
})

# If birth_date is datetime, calculate average age
if extended_cluster_analysis['birth_date'].dtype == 'datetime64[ns]':
    current_date = pd.Timestamp.now()
    extended_cluster_analysis['avg_age'] = (current_date - extended_cluster_analysis['birth_date']).dt.days / 365
    extended_cluster_analysis = extended_cluster_analysis.drop('birth_date', axis=1)

print("\\nExtended Credit Card User Segment Profiles:")
display(extended_cluster_analysis)

# Identify most valuable segments for credit card product usage
sorted_clusters = cluster_analysis.sort_values('total_profit', ascending=False)
top_clusters = sorted_clusters.head(2).index.tolist()

print(f"\\nTop 2 most valuable credit card user segments: {top_clusters}")

# Create credit card user personas
personas = {}
for cluster in range(optimal_k):
    data = extended_cluster_analysis.loc[cluster]

    # Determine persona characteristics relevant to credit card usage
    if data['avg_sales_L36M'] > extended_cluster_analysis['avg_sales_L36M'].mean():
        value_segment = "High Value"
    else:
        value_segment = "Low Value"

    if data['cnt_sales_L36M'] > extended_cluster_analysis['cnt_sales_L36M'].mean():
        usage_segment = "Active Users"
    else:
        usage_segment = "Inactive Users"

    if data['month_since_last_sales'] < extended_cluster_analysis['month_since_last_sales'].mean():
        recency_segment = "Recent Users"
    else:
        recency_segment = "Lapsed Users"

    # Create persona description focused on credit card product usage
    persona_name = f"Segment {cluster}: {value_segment}, {usage_segment}, {recency_segment}"

    # Additional details
    gender = "Predominantly Female" if data['flag_female'] > 0.5 else "Predominantly Male"

    details = f"""
    Credit Card Product Usage Profile:
    - Avg Product Sales: €{data['avg_sales_L36M']:.2f}
    - Usage Frequency: {data['cnt_sales_L36M']:.2f} transactions (past 36 months)
    - Months Since Last Usage: {data['month_since_last_sales']:.2f}
    - Avg Profit per Customer: €{data['potential_profit']:.2f} (at 2.4% margin)

    Customer Demographics:
    - Gender Distribution: {gender} ({100 * data['flag_female']:.1f}% Female)
    - Customer Tenure: {data['MOB']:.1f} months
    - Customer Value Level: {data['customer_value_level']}
    - Account Activity Level: {data['account_activity_level']}
    - Promotional Campaign Exposure: {data['count_direct_promo_L12M']:.2f} promotions (past 12 months)
    """

    if 'avg_age' in extended_cluster_analysis:
        details += f"- Average Age: {data['avg_age']:.1f} years\\n"

    personas[persona_name] = details

# Print the personas
print("\\nCredit Card User Personas:")
for persona, details in personas.items():
    print("\\n" + "=" * 60)
    print(persona)
    print("-" * 60)
    print(details)

Insights and Recommendations

For Each Segments

User Segment 0

  1. Maintain standard credit product communication

  2. Include in general credit promotional campaigns

  3. Monitor for signals of increased credit usage potential

  4. Provide basic financial education on credit product benefits

  5. Implement cost-effective service model for this segment

User Segment 1

  1. Create "win-back" campaigns with special credit product incentives

  2. Conduct surveys to understand barriers to credit product usage

  3. Simplify credit application and usage processes

  4. Offer "welcome back" bonuses for first new credit transaction

  5. Develop streamlined credit products for inactive customers

User Segment 2

  1. Implement tiered rewards based on credit transaction value

  2. Provide financial education on premium credit products and their benefits

  3. Create value-added service bundles around core credit products

  4. Develop upgrade paths to higher-tier credit options

  5. Use targeted marketing to showcase premium credit product benefits

User Segment 3

  1. Create transaction-based incentives focused on credit products

  2. Implement "milestone" bonuses for reaching credit usage targets

  3. Develop mobile notifications for unused credit benefits

  4. Launch targeted campaigns highlighting the benefits of regular credit product usage

  5. Consider temporary rate improvements for increasing usage frequency

Copyright © Baskoroajii.2025. All rights reserved.

Copyright © Baskoroajii.2025. All rights reserved.

Copyright © Baskoroajii.2025. All rights reserved.

Create a free website with Framer, the website builder loved by startups, designers and agencies.