Window functions
Window functions in SQL are used to perform calculations across a set of table rows that are related to the current row. These functions are typically used for running totals, moving averages, and ranking.
General Syntax of Window Functions
The syntax of a window function involves using the OVER() clause to define the "window" of rows that the function operates on.
function_name([arguments])
OVER (
[PARTITION BY partition_expression]
[ORDER BY order_expression]
[ROWS or RANGE clause]
)
Breakdown of the Syntax
function_name: This is the name of the window function (e.g., ROW_NUMBER(), RANK(), SUM(), AVG(), etc.).
[arguments]: The arguments or expressions passed to the function, if required.
OVER(): The OVER clause defines the window or set of rows that the function operates on.
PARTITION BY: Divides the result set into partitions to which the window function is applied. Each partition is treated as a separate group.
ORDER BY: Specifies the order of rows in each partition. The function is applied according to this order.
ROWS or RANGE: Defines the window frame, specifying the number of rows to include in the window, relative to the current row. This is optional and more advanced.
Example Syntaxes
1. ROW_NUMBER(): Assigns a unique sequential integer to rows within a partition.
SELECT
column_name,
ROW_NUMBER() OVER (PARTITION BY partition_column ORDER BY order_column) AS row_num
FROM
table_name;
2. RANK(): Assigns a rank to each row within a partition, with possible gaps in the rank values.
SELECT
column_name,
RANK() OVER (PARTITION BY partition_column ORDER BY order_column) AS rank
FROM
table_name;
3. SUM(): Computes the cumulative sum within a window frame.
SELECT
column_name,
SUM(sale_amount) OVER (PARTITION BY partition_column ORDER BY order_column) AS cumulative_sum
FROM
table_name;
4. AVG(): Computes the average value within a window frame.
SELECT
column_name,
AVG(sale_amount) OVER (PARTITION BY partition_column ORDER BY order_column) AS moving_average
FROM
table_name;
5. LAG(): Accesses data from a previous row in the same result set without using a self-join.
SELECT
column_name,
sale_amount,
LAG(sale_amount, 1, 0) OVER (ORDER BY order_column) AS previous_sale
FROM
table_name;
6. NTILE(): Divides rows into a specified number of groups, each with as close to an equal number of rows as possible.
SELECT
column_name,
sale_amount,
NTILE(4) OVER (ORDER BY sale_amount) AS quartile
FROM
table_name;
Using ROWS or RANGE Clause
The ROWS and RANGE clauses allow for more precise control over the window frame.
ROWS: Defines the window in terms of physical rows.
SUM(sale_amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
RANGE: Defines the window based on logical relationships (e.g., a range of values).
SUM(sale_amount) OVER (ORDER BY sale_date RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW)
Summary
Window functions are very powerful and flexible tools in SQL that allow for advanced data analysis directly within your SQL queries. The OVER() clause is crucial to how these functions operate, enabling you to partition, order, and frame your data in various ways.