-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path15-SQL-Window_Function
More file actions
222 lines (159 loc) · 5.65 KB
/
15-SQL-Window_Function
File metadata and controls
222 lines (159 loc) · 5.65 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
Window Functions in SQL
Window Functions are advanced SQL functions used to perform calculations across a set of rows related to the current row without grouping the result.
Unlike GROUP BY, they do not collapse rows. Every row remains visible.
OVER() clause:
It is used with window functions to define that window
1. partition the rows to form a set of rows
2. order rows within the partition into a particular order
Basic Syntax:
SELECT column_name,
window_function() OVER (PARTITION BY column ORDER BY column)
FROM table_name;
Parts of Window Function
Part Meaning
window_function() Function like ROW_NUMBER(), RANK()
OVER() Defines the window
PARTITION BY Divides data into groups
ORDER BY Defines order inside the group
1️⃣ ROW_NUMBER()
Assigns a unique number to each row.
Example:
SELECT transactions_id,
ROW_NUMBER() OVER(ORDER BY transactions_id) AS row_num
FROM retail_sales;
Example result:
transactions_id row_num
1 1
2 2
3 3
5 4
2️⃣ RANK()
Gives ranking but skips numbers when there is a tie.
SELECT total_sale,
RANK() OVER(ORDER BY total_sale DESC) AS rank_num
FROM retail_sales;
Example:
total_sale rank
1000 1
900 2
900 2
800 4
3️⃣ DENSE_RANK()
Similar to RANK() but does not skip numbers.
SELECT total_sale,
DENSE_RANK() OVER(ORDER BY total_sale DESC) AS rank_num
FROM retail_sales;
Example:
total_sale rank
1000 1
900 2
900 2
800 3
4️⃣ LEAD()
Gets the next row value.
SELECT transactions_id,
LEAD(transactions_id) OVER(ORDER BY transactions_id) AS next_id
FROM retail_sales;
Example:
transactions_id next_id
1 2
2 3
3 5
Here we can see 4 is missing.
5️⃣ LAG()
Gets the previous row value.
SELECT transactions_id,
LAG(transactions_id) OVER(ORDER BY transactions_id) AS previous_id
FROM retail_sales;
Example:
transactions_id previous_id
2 1
3 2
5 3
⭐ Example: Finding Missing IDs Using LEAD()
SELECT transactions_id,
LEAD(transactions_id) OVER(ORDER BY transactions_id) AS next_id
FROM retail_sales;
Then check:
next_id - transactions_id > 1
This shows gaps in transaction IDs.
# Why Window Functions Are Important
🔹They are heavily used in:
🔹Data Analysis
🔹Financial reports
🔹Ranking customers
🔹Running totals
🔹Detecting missing data
🔹Time series analysis
✅ Since we are learning SQL for Data Analyst roles, the 5 most important window functions to master are:
ROW_NUMBER()
RANK()
DENSE_RANK()
LEAD()
LAG()
EXAMPLE:
USE market_star_schema;
-- find the avg sales for each city using window functions
SELECT city, ROUND(sales,2) AS sales,
AVG(sales) OVER(PARTITION BY city ORDER BY sales DESC) AS avg_sales
FROM market_fact_full mff
JOIN cust_dimen cd ON mff.cust_id=cd.cust_id;
-- find the rank(), dense_rank, row_num() for the rows based on order_quantity
SELECT order_quantity,
RANK() OVER(ORDER BY order_quantity DESC) AS order_quantity_rank,
DENSE_RANK() OVER(ORDER BY order_quantity DESC) AS order_quantity_dense_rank,
ROW_NUMBER() OVER(ORDER BY order_quantity DESC) AS order_quantity_row_num
FROM market_fact_full;
-- find the rank(), dense_rank(), row_num() based on sales amount
SELECT *,
RANK() OVER(ORDER BY sales DESC) AS sales_rank,
DENSE_RANK() OVER(ORDER BY sales DESC) AS sales_dense_rank,
ROW_NUMBER() OVER(ORDER BY sales DESC) AS sales_row_num
FROM market_fact_full;
-- find the rank(), dense_rank(), row_num() for Bangalore and Chennai cities based on avg sales
SELECT city, ROUND(AVG(sales),2) AS avg_sales,
RANK() OVER(ORDER BY AVG(sales) DESC) AS city_rank,
DENSE_RANK() OVER(ORDER BY AVG(sales) DESC) AS city_dense_rank,
ROW_NUMBER() OVER(ORDER BY AVG(sales) DESC) AS city_row_num
FROM cust_dimen cd
JOIN market_fact_full mff ON cd.cust_id=mff.cust_id
WHERE city IN ('Bangalore','Chennai')
GROUP BY city;
/* find rank(), dense_rank(), row_num() based on avg_profit for the orders placed in October,
March for the years 2011 and 2012 from regular air ship_mode */
SELECT MONTH(order_date), YEAR(order_date),
AVG(profit) AS avg_profit,
RANK() OVER(ORDER BY AVG(profit) DESC) AS avg_profit_rank,
DENSE_RANK() OVER(ORDER BY AVG(profit) DESC) AS avg_profit_dense_rank,
ROW_NUMBER() OVER(ORDER BY AVG(profit) DESC) AS avg_profit_row_num
FROM market_fact_full mff
JOIN orders_dimen od ON mff.ord_id=od.ord_id
JOIN shipping_dimen sd ON mff.ship_id=sd.ship_id
WHERE MONTH(order_date) IN ('10','03')
AND YEAR(order_date) IN ('2011','2012')
AND ship_mode='Regular Air'
GROUP BY MONTH(order_date), YEAR(order_date);
-- first 5 records with highest sales values without using Limit & offset
WITH sales_summary AS (
SELECT *,
RANK() OVER(ORDER BY sales DESC) AS sales_rank
FROM market_fact_full)
SELECT * FROM sales_summary
WHERE sales_rank<=5;
-- find the details of the order who made highest sales in each city without using limit & offset
WITH cust_summary AS (
SELECT cd.cust_id, city, ROUND(sales,2) AS sales,
RANK() OVER(PARTITION BY city ORDER BY sales DESC) AS cust_rank,
DENSE_RANK() OVER(PARTITION BY city ORDER BY sales DESC) AS cust_dense_rank,
ROW_NUMBER() OVER(PARTITION BY city ORDER BY sales DESC) AS cust_row_num
FROM cust_dimen cd
JOIN market_fact_full mff ON cd.cust_id=mff.cust_id)
SELECT * FROM cust_summary
WHERE cust_rank=1;
-- find the profit change compared to previous day and next day for each order date
SELECT order_date, profit,
LAG(profit) OVER(ORDER BY order_date) AS prev_day_profit,
LEAD(profit) OVER(ORDER BY order_date) AS next_day_profit,
(profit - (LAG(profit,1,0) OVER(ORDER BY order_date))) AS profit_change
FROM market_fact_full mff
JOIN orders_dimen od ON mff.ord_id=od.ord_id;