Let's walk through another SQL-based ETL example. This time, we'll deal with a slightly more complex scenario where we have to aggregate data and calculate new metrics before loading it into a target table.
Scenario:
You have an orders
table containing information about customer orders. You want to extract this data, transform it by calculating the total revenue per customer, and then load the results into a customer_revenue
table.
Source Table: orders
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
amount DECIMAL(10, 2)
);
INSERT INTO orders (order_id, customer_id, order_date, amount) VALUES
(1, 101, '2024-08-01', 250.75),
(2, 102, '2024-08-03', 300.50),
(3, 101, '2024-08-05', 120.00),
(4, 103, '2024-08-07', 450.00),
(5, 104, '2024-08-09', 200.00),
(6, 102, '2024-08-10', 150.00),
(7, 101, '2024-08-12', 330.50);
1. Extract:
Select data from the orders
table.
SELECT * FROM orders;
2. Transform:
Aggregate the data to calculate the total revenue per customer.
-- Example Transformation:
-- 1. Group the data by `customer_id`
-- 2. Calculate the total revenue for each customer
-- 3. Filter out customers with revenue below a certain threshold (e.g., 200)
SELECT
customer_id,
SUM(amount) AS total_revenue
FROM
orders
GROUP BY
customer_id
HAVING
SUM(amount) >= 200;
3. Load:
Insert the aggregated data into the customer_revenue
table.
-- Create the target table if it doesn't exist
CREATE TABLE customer_revenue (
customer_id INT PRIMARY KEY,
total_revenue DECIMAL(10, 2)
);
-- Load the transformed data into the target table
INSERT INTO customer_revenue (customer_id, total_revenue)
SELECT
customer_id,
SUM(amount) AS total_revenue
FROM
orders
GROUP BY
customer_id
HAVING
SUM(amount) >= 200;
Complete Example in SQL:
-- Step 1: Extract
SELECT * FROM orders;
-- Step 2: Transform
SELECT
customer_id,
SUM(amount) AS total_revenue
FROM
orders
GROUP BY
customer_id
HAVING
SUM(amount) >= 200;
-- Step 3: Load
CREATE TABLE IF NOT EXISTS customer_revenue (
customer_id INT PRIMARY KEY,
total_revenue DECIMAL(10, 2)
);
INSERT INTO customer_revenue (customer_id, total_revenue)
SELECT
customer_id,
SUM(amount) AS total_revenue
FROM
orders
GROUP BY
customer_id
HAVING
SUM(amount) >= 200;
-- Verify the data in the target table
SELECT * FROM customer_revenue;
Explanation:
-
Extract: The data is selected from the
orders
table. -
Transform: The data is grouped by
customer_id
, and the total revenue is calculated for each customer. We use theHAVING
clause to filter out customers with total revenue below a certain threshold (e.g., 200). -
Load: The aggregated data is inserted into the
customer_revenue
table.
Output:
After running this ETL process, the customer_revenue
table will contain the total revenue for each customer who has spent more than a specified amount. This table can be used for further analysis, such as identifying high-value customers or generating reports.
This example demonstrates how SQL can be used to perform more advanced transformations like aggregation and filtering during an ETL process.