﻿ Snowflake Window Functions: Partition By and Order By – BMC Software | Blogs
Machine Learning & Big Data Blog

# Snowflake Window Functions: Partition By and Order By

Snowflake supports windows functions. Think of windows functions as running over a subset of rows, except the results return every row. That’s different from the traditional SQL group by where there is one result for each group.

A windows function could be useful in examples such as:

• A running sum
• The average values over some number of previous rows
• A percentile ranking of each row among all rows.

The topic of window functions in Snowflake is large and complex. This tutorial serves as a brief overview and we will continue to develop additional tutorials.

## Snowflake definitions

Snowflake defines windows as a group of related rows. It is defined by the over() statement. The over() statement signals to Snowflake that you wish to use a windows function instead of the traditional SQL function, as some functions work in both contexts.

A windows frame is a windows subgroup. Windows frames require an order by statement since the rows must be in known order.

Windows frames can be cumulative or sliding, which are extensions of the order by statement. Cumulative means across the whole windows frame. Sliding means to add some offset, such as +- n rows.

A window can also have a partition statement. A partition is a group of rows, like the traditional group by statement.

## Windows vs regular SQL

For example, if you grouped sales by product and you have 4 rows in a table you might have two rows in the result:

### Regular SQL group by

```select count(*) from sales group by product:
10 product A
20 product B
```

### Windows function

With the windows function, you still have the count across two groups but each of the 4 rows in the database is listed yet the sum is for the whole group, when you use the partition statement.

```count 10 product A
count 10 product A
count 20 product B
count 20 product B
```

## Create some sample data

To study this, first create these two tables.

```CREATE TABLE customers
(
customernumber     varchar(100) PRIMARY KEY,
customername varchar(50),
phonenumber varchar(50),
postalcode varchar(50),
locale varchar(10),
datecreated date,
email varchar(50)
);
CREATE TABLE orders
(
customernumber     varchar(100) PRIMARY KEY,
ordernumber varchar(100),
orderdate date,
ordertype varchar(10),
shipdate date,
discount number,
quantity int,
productnumber varchar(50)
)
```

Then paste in this SQL data. The top of the data looks like this:

```insert into customers (customernumber,customername,phonenumber,postalcode,locale,datecreated,email) values ('440','tiqthogsjwsedifisiir','3077854','vdew','','2020-09-27','twtp@entt.com');
insert into orders (customernumber,ordernumber,comments,orderdate,ordertype,shipdate,discount,quantity,productnumber) values ('440','402','swgstdhmju','2020-09-27','sale','2020-10-01','0.7005950240358919','61','BB111');
insert into customers (customernumber,customername,phonenumber,postalcode,locale,datecreated,email) values ('802','hrdngzutwelfhgwcyznt','1606845','rnmk','','2020-09-27','ympv@zfze.com');
insert into orders (customernumber,ordernumber,comments,orderdate,ordertype,shipdate,discount,quantity,productnumber) values ('802','829','jybwzvoyzb','2020-09-27','sale','2020-10-06','0.3702248922841853','75','FF4444');
insert into customers (customernumber,customername,phonenumber,postalcode,locale,datecreated,email) values ('199','ogvaevvhhqtjcqggafnv','8452159','hyxm','','2020-09-27','znqo@rftp.com');
```

## Partition by

A partition creates subsets within a window. Here, we have the sum of quantity by product.

```select customernumber, ordernumber, productnumber,quantity,
sum(quantity) over (partition by productnumber) as prodqty
from orders
order by ordernumber
```

This produces the same results as this SQL statement in which the orders table is joined with itself:

```select customernumber,
ordernumber,
productnumber,quantity,
(select sum(quantity) from orders as o2 where o1.productnumber = o2.productnumber) as prodqty
from orders as o1
order by ordernumber
```

## Order by

The sum() function does not make sense for a windows function because it’s is for a group, not an ordered set. Yet Snowflake lets you use sum with a windows frame—i.e., a statement with an order() statement—thus yielding results that are difficult to interpret.

Let’s look at the rank function, one that is relevant to ordering. Here, we use a windows function to rank our most valued customers. These are the ones who have made the largest purchases.

The rank() function takes no arguments. The window is ordered by quantity in descending order. We limit the output to 10 so it fits on the page below.

```select customernumber, quantity, rank() over (order by quantity desc) from orders  limit 10
```

Here is the output. The customer who has purchases the most is listed first.

For more tutorials like this, explore these resources:

This e-book teaches machine learning in the simplest way possible. This book is for managers, programmers, directors – and anyone else who wants to learn machine learning. We start with very basic stats and algebra and build upon that.

These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

### BMC Brings the A-Game

BMC works with 86% of the Forbes Global 50 and customers and partners around the world to create their future. With our history of innovation, industry-leading automation, operations, and service management solutions, combined with unmatched flexibility, we help organizations free up time and space to become an Autonomous Digital Enterprise that conquers the opportunities ahead.