Exploration of Digital Banking Transactions: A SQL Analysis

mwang-cmn - Aug 17 - - Dev Community

Introduction

Data Bank, a cutting-edge digital bank, is pioneering the integration of banking and distributed data storage. By linking customers' cloud storage limits to their account balances, Data Bank offers a unique blend of financial services and secure data storage. As the digital banking landscape evolves, understanding customer behavior and forecasting data needs are crucial for strategic growth and efficient resource allocation.

Problem Statement

This project aims to explore customer nodes and transaction patterns within the Data Bank system to identify key metrics that will help the management team optimize customer allocation, improve transaction tracking, and accurately forecast future data storage requirements. View the data challenge here

Data Structure and SQL Environment

The dataset used in this analysis was intergrated into a SQL Server database. The database structure is designed with clarity featuring three key tables:

1. Regions:
2. Customer Nodes
3. Customer Transactions
The Entity Relationship Diagram for this dataset is as follows:

Image description

Data Cleaning

The dataset has been thouroughly examined and all three tables are clean and suited for analysis. There are no null values, duplicates or incorrect data types.

Question and Answers

  1. How many unique nodes are there on the Data Bank system?
SELECT COUNT(DISTINCT node_id) as unique_nodes
  FROM dbo.customer_nodes;
Enter fullscreen mode Exit fullscreen mode

Image description

  1. What is the number of nodes per region?
SELECT r.region_name, COUNT(c.node_id) as nodes
  FROM customer_nodes c
 INNER JOIN regions r
    ON c.region_id = r.region_id
 GROUP BY r.region_name;

Enter fullscreen mode Exit fullscreen mode

Image description

How many customers are allocated to each region?
Counts the unique number of customers from each Region, in descending order. Australia has the largest number of customers

SELECT r.region_name, COUNT(DISTINCT c.customer_id) as customer_count
  FROM customer_nodes c
 INNER JOIN regions r
    ON c.region_id = r.region_id
 GROUP BY r.region_name
 ORDER BY customer_count DESC;

Enter fullscreen mode Exit fullscreen mode

Image description

How many days on average are customers reallocated to a different node?

SELECT AVG(DATEDIFF(DAY, start_date, end_date)) AS avg_days
FROM customer_nodes
WHERE end_date != '9999-12-31';

Enter fullscreen mode Exit fullscreen mode

Image description

What is the median, 80th and 95th percentile for this same reallocation days metric for each region?
The CTE date_diff calculates the number of days each customer spends on a node before being reallocated, linking this data to specific regions. The query then computes the median (50th percentile), 80th percentile, and 95th percentile of these reallocation days for each region

WITH date_diff AS(
    SELECT  c.customer_id,
            r.region_name,
            r.region_id,
            DATEDIFF(DAY, start_date, end_date) AS reallocation_days
    FROM customer_nodes c
    INNER JOIN regions r
    ON c.region_id = r.region_id
    WHERE end_date != '9999-12-31'
)
SELECT DISTINCT region_id,
       region_name,
       PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY reallocation_days) OVER (PARTITION BY region_name) AS median_days,
       PERCENTILE_CONT(0.8) WITHIN GROUP (ORDER BY reallocation_days) OVER (PARTITION BY region_name) AS eighty_perc_days,
       PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY reallocation_days) OVER (PARTITION BY region_name) AS ninety_five_perc_days
FROM date_diff
ORDER BY region_id, region_name;

Enter fullscreen mode Exit fullscreen mode

Image description
What is the unique count and total amount for each transaction type?

SELECT txn_type AS transaction_type, 
        COUNT(*) AS unique_transactions,
        SUM(txn_amount) AS total_amount
  FROM customer_transactions
 GROUP BY txn_type
 ORDER BY unique_transactions DESC, total_amount DESC;
Enter fullscreen mode Exit fullscreen mode

Image description

There were 4,777 unique transactions during the period, 56% of which were deposits. Additionally, the total amount transacted during the period was $2,958,708.

What is the average total historical deposit counts and amounts for all customers?
The CTE customer_deposits calculates the total number of deposits and the total amount of money deposited for each customer in the customer_transactions table. Itfilters the transactions to include only those where the transaction type is 'deposit', then grouping the results by each customer_id.

WITH customer_deposits AS (
SELECT  customer_id,
        COUNT(txn_type) as deposit_count,
        SUM(txn_amount) as deposit_amount
FROM customer_transactions
WHERE txn_type='deposit'
GROUP BY customer_id
)
 SELECT AVG(deposit_count) AS avg_count,
    AVG(deposit_amount) AS avg_amount
    FROM customer_deposits;
Enter fullscreen mode Exit fullscreen mode

Image description

For each month - how many Data Bank customers make more than 1 deposit and either 1 purchase or 1 withdrawal in a single month?
The CTE customer_trends tracks customer transaction behavior on a monthly basis. It calculates, for each month and each customr id:

  • The number of deposits (deposit_count)
  • The number of purchases (purchase_count)
  • The number of withdrawals (withdrawal_count) The main query then determines, for each month, how many unique customers, made more than 1 deposit (deposit_count > 1).Additionally, made either at least 1 purchase or at least 1 withdrawal (purchase_count > 0 OR withdrawal_count > 0) in the same month.
WITH customer_trends AS 
(
SELECT customer_id,
       DATEPART(MONTH, txn_date) AS month_id,
       DATENAME(MONTH, txn_date) AS month_name,
       COUNT(CASE WHEN txn_type = 'deposit' THEN 1 END) AS deposit_count,
       COUNT(CASE WHEN txn_type = 'purchase' THEN 1 END) AS purchase_count,
       COUNT(CASE WHEN txn_type='withdrawal' THEN 1 END) AS withdrawal_count
  FROM customer_transactions
 GROUP BY customer_id, DATEPART(MONTH, txn_date),DATENAME(MONTH, txn_date)
)
SELECT  month_id,
      month_name,
      COUNT(DISTINCT customer_id) AS total_customers
  FROM customer_trends
 WHERE deposit_count>1 AND (purchase_count>0 OR withdrawal_count>0)
 GROUP BY month_id, month_name
 ORDER BY month_id, month_name;
Enter fullscreen mode Exit fullscreen mode

Image description

This analysis identifies customers who are more actively engaged by tracking those who not only deposit money multiple times within a month but also use the account for other types of transactions like purchases or withdrawals.

What is the closing balance for each customer at the end of the month?

WITH cashflows AS (
    SELECT 
        customer_id,
        DATEPART(MONTH, txn_date) AS month_id,  -- Use month number for proper ordering
        DATENAME(MONTH, txn_date) AS month_name,
        SUM(CASE WHEN txn_type = 'deposit' THEN txn_amount ELSE -txn_amount END) AS inflow
    FROM 
        customer_transactions
    GROUP BY 
        customer_id, DATEPART(MONTH, txn_date), DATENAME(MONTH, txn_date)
)
SELECT 
    customer_id, 
    month_name, 
    SUM(inflow) OVER (
        PARTITION BY customer_id 
        ORDER BY month_id 
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS closing_balance
FROM 
    cashflows
ORDER BY 
    customer_id, month_id;

Enter fullscreen mode Exit fullscreen mode

Image description

To calculate the closing balance for each customer as of the maximum transaction date (txn_date) within each month;

WITH cashflows AS (
    SELECT 
        customer_id,
        txn_date,
        SUM(CASE WHEN txn_type = 'deposit' THEN txn_amount ELSE -txn_amount END) AS inflow
    FROM 
        customer_transactions
    GROUP BY 
        customer_id, txn_date
),
latest_cashflows AS (
    SELECT 
        customer_id,
        txn_date,
        SUM(inflow) OVER (PARTITION BY customer_id ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS closing_balance,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY txn_date DESC) AS rn
    FROM 
        cashflows
)
SELECT 
    customer_id, 
    txn_date, 
    closing_balance
FROM 
    latest_cashflows
WHERE 
    rn = 1
ORDER BY 
    customer_id;
Enter fullscreen mode Exit fullscreen mode
  • cashflows CTE:
    This CTE calculates the net inflow for each customer_id and each txn_date. The result includes the transaction date (txn_date) along with the month information.

  • latest_cashflows CTE:
    In this step, the ROW_NUMBER() function is used to assign a rank (rn) to each row within the partition of customer_id and month_id, ordered by txn_date in descending order. This ensures that the latest transaction date within each month is given the rank of 1.

  • Final Query:
    The final SELECT statement retrieves the closing balance (inflow) for the latest transaction date in each month by filtering for rn = 1.
    What is the percentage of customers who increase their closing balance by more than 5%?

Image description

What is the percentage of customers who increase their closing balance by more than 5%?

  • The cashflows CTE calculates the net cash flow for each customer per month by summing up the transaction amounts, treating deposits as positive inflows and other transactions as outflows.
  • ClosingBalance CTE:calculates the running (cumulative) closing balance for each customer over time by summing the monthly inflows up to and including the current month.
  • PercentChange CTE: This calculates the percentage change in the closing balance for each customer from one month to the next, using the LAG function to compare the closing balance of the current month with the previous month.
WITH cashflows AS (
    SELECT customer_id,
           DATEPART(YEAR, txn_date) * 12 + DATEPART(MONTH, txn_date) AS month_id,
           SUM(CASE WHEN txn_type = 'deposit' THEN txn_amount ELSE -txn_amount END) AS inflow
    FROM 
        customer_transactions
    GROUP BY 
        customer_id, DATEPART(YEAR, txn_date) * 12 + DATEPART(MONTH, txn_date)
),
ClosingBalance AS 
(
    SELECT customer_id, 
           month_id, 
           SUM(inflow) OVER (PARTITION BY customer_id ORDER BY month_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS closing_balance
      FROM cashflows
),
PercentChange AS 
(
    SELECT customer_id,
           month_id,
           closing_balance,
           100 * (closing_balance - LAG(closing_balance) OVER (PARTITION BY customer_id ORDER BY month_id)) / NULLIF(LAG(closing_balance) OVER (PARTITION BY customer_id ORDER BY month_id), 0) AS percent_increase
    FROM ClosingBalance
)
SELECT 100 * COUNT(DISTINCT customer_id) / CAST((SELECT COUNT(DISTINCT customer_id) FROM customer_transactions) AS float) AS percent_customers
  FROM PercentChange
 WHERE percent_increase > 5;

Enter fullscreen mode Exit fullscreen mode

Image description

Full sql script - Github

. . . . . . .