Introduction into Window Functions on SQL Server

Kay Sauter - Mar 7 '20 - - Dev Community

Why window functions may come in handy

Let us dive directly into a real world example. You've got a table with salaries of different job functions and you are asked to find out what the n-tiles (quantiles, mean etc.) are of each function within a company. So this means you've got many people that hold a position which you want to compare with others. You want to compare salaries of marketing professionals to their peers, but you don't want to compare said salaries to marketing directors because they have their own salary bands. So that means you need to frame (also called partition) job titles with their own with which you calculate the n-tiles. Without window functions, this could be a quite fussy task.

In order to walk through the problem, create a new database, I'm going to name mine FoehnWind. The following script creates a new table and inserts some sample data of 20 rows into it.

Tutorial

USE [FoehnWind];
GO
CREATE TABLE dbo.Employee
(ID           TINYINT IDENTITY(1, 1) NOT NULL, 
 FirstName    VARCHAR(150) NOT NULL, 
 LastName     VARCHAR(150) NULL, 
 JobTitle     VARCHAR(100) NOT NULL, 
 AnnualSalary INT NOT NULL, 
 CONSTRAINT [PK_dbo.Employee] PRIMARY KEY CLUSTERED(ID ASC)
)
ON [PRIMARY];
GO
INSERT INTO [dbo].[Employee] 
VALUES
 ( 'Ken','Sanchez','Chief Executive Officer', 530000)
,( 'Jo','Berry','Marketing Assistant', 71875 )
,( 'Lori','Penor','Marketing Assistant', 73398 )
,( 'Pat','Coleman','Marketing Assistant', 51874 )
,( 'Stuart','Macrae','Marketing Assistant', 59108 )
,( 'Mary','Dempsey','Marketing Assistant', 57641 )
,( 'Wanida','Benshoof','Marketing Assistant', 60779 )
,( 'David','Bradley','Marketing Manager', 130000 )
,( 'Jill','Williams','Marketing Specialist', 108117 )
,( 'John','Wood','Marketing Specialist', 99407 )
,( 'Sariya','Pit','Marketing Specialist', 95048 )
,( 'Terry','Eminhizer','Marketing Specialist', 106316 )
,( 'Michael','Sullivan','Marketing Specialist', 87047 )
,( 'Ovidiu','Cracium','Marketing Specialist', 107883 )
,( 'Rob','Walters','Marketing Specialist', 109369 )
,( 'Garrett','Young','Senior Marketing Specialist', 118051 )
,( 'George','Li','Senior Marketing Specialist', 117253 )
,( 'Russell','King','Senior Marketing Specialist', 123362 )
,( 'Sandra','Alayo','Senior Marketing Specialist', 118185 )
,( 'Denise','Smith','Vice President Marketing', 250000 );
GO
Enter fullscreen mode Exit fullscreen mode

Firstly, lets use a window function which calculates the averages and percentiles of salaries for each job title:

SELECT distinct e.JobTitle
        ,AVG(e.AnnualSalary) OVER (PARTITION BY e.JobTitle) AS AverageSalary
        ,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY e.AnnualSalary) 
            OVER (PARTITION BY e.JobTitle) as MedianContinous
        ,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY e.AnnualSalary) 
            OVER (PARTITION BY e.JobTitle) as MedianDisContinous
FROM dbo.Employee e
order by e.JobTitle
Enter fullscreen mode Exit fullscreen mode

You perhaps now wonder how the functions PERCENTILE_CONT() and PERCENTILE_DISC differ. The discontinous variant returns an actual value that exists in the calculated set and pointing out that this values is the closest one being to the calculative median. The continous percentile in contrast returns a value that is actually the calculated median. For example, Marketing Assistant Stuart Macrae has an Annual Salary of 59'108. This amount is also the median in this job role, but the calculated median is 59'943.5 so Stuart is slightly below the median but his salary is the closest one to the median.

JobTitle EmployeeName AnnualSalary AverageSalary MedianContinous MedianDisContinous
Chief Executive Officer Ken Sanchez 530000 530000 530000 530000
Marketing Assistant Pat Coleman 51874 62445 59943.5 59108
Marketing Assistant Mary Dempsey 57641 62445 59943.5 59108
Marketing Assistant Stuart Macrae 59108 62445 59943.5 59108
Marketing Assistant Wanida Benshoof 60779 62445 59943.5 59108
Marketing Assistant Jo Berry 71875 62445 59943.5 59108
Marketing Assistant Lori Penor 73398 62445 59943.5 59108
Marketing Manager David Bradley 130000 130000 130000 130000
Marketing Specialist Michael Sullivan 87047 101883 106316 106316
Marketing Specialist Sariya Pit 95048 101883 106316 106316
Marketing Specialist John Wood 99407 101883 106316 106316
Marketing Specialist Terry Eminhizer 106316 101883 106316 106316
Marketing Specialist Ovidiu Cracium 107883 101883 106316 106316
Marketing Specialist Jill Williams 108117 101883 106316 106316
Marketing Specialist Rob Walters 109369 101883 106316 106316
Senior Marketing Specialist George Li 117253 119212 118118 118051
Senior Marketing Specialist Garrett Young 118051 119212 118118 118051
Senior Marketing Specialist Sandra Alayo 118185 119212 118118 118051
Senior Marketing Specialist Russell King 123362 119212 118118 118051
Vice President Marketing Denise Smith 250000 250000 250000 250000

Further reading

This was just an example of what window functions are handy for. More window functions are documented here. Ranked functions are used quite often to find out what happened over time, e.g. if a status of a product changed.
If you want to dig even deeper into this topic, I recommend (no affiliation) the book T-SQL Window Functions: For data analysis and beyond (2nd Edition) (Developer Reference) by Itzik Ben-Gan. He's authored some excellent books with great knowledge.

[1]: Fohen wind is a fairly common wind in the european Alps, which is a reference to the popular sample database Nordwind.
Cover Photo Credit: msandersmusic, on Pixabay

. . . . . .