How to Reduce Syntactic Overhead Using the SQL WINDOW Clause

Spread the love
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

SQL is a verbose language, and one of the most verbose features are window functions.
In a stack overflow question that I’ve encountered recently, someone asked to calculate the difference between the first and the last value in a time series for any given day:
Input

volume tstamp
—————————
29011 2012-12-28 09:00:00
28701 2012-12-28 10:00:00
28830 2012-12-28 11:00:00
28353 2012-12-28 12:00:00
28642 2012-12-28 13:00:00
28583 2012-12-28 14:00:00
28800 2012-12-29 09:00:00
28751 2012-12-29 10:00:00
28670 2012-12-29 11:00:00
28621 2012-12-29 12:00:00
28599 2012-12-29 13:00:00
28278 2012-12-29 14:00:00

Desired output

first last difference date
————————————
29011 28583 428 2012-12-28
28800 28278 522 2012-12-29

How to write the query?
Notice that the value and timestamp progression do not correlate as it may appear. So, there is not a rule that if Timestamp2 > Timestamp1 then Value2 < Value1. Otherwise, this simple query would work (using PostgreSQL syntax): SELECT max(volume) AS first, min(volume) AS last, max(volume) - min(volume) AS difference, CAST(tstamp AS DATE) AS date FROM t GROUP BY CAST(tstamp AS DATE); There are several ways to find the first and last values within a group that do not involve window functions. For example: In Oracle, you can use the FIRST and LAST functions, which for some arcane reason are not written FIRST(...) WITHIN GROUP (ORDER BY ...) or LAST(...) WITHIN GROUP (ORDER BY ...), like other sorted set aggregate functions, but some_aggregate_function(...) KEEP (DENSE_RANK FIRST ORDER BY ...). Go figure In PostgreSQL, you could use the DISTINCT ON syntax along with ORDER BY and LIMIT More details about the various approaches can be found here:https://blog.jooq.org/2017/09/22/how-to-write-efficient-top-n-queries-in-sql The best performing approach would be to use an aggregate function like Oracle’s, but few databases have this function. So, we’ll resort to using the FIRST_VALUE and LAST_VALUE window functions: SELECT DISTINCT first_value(volume) OVER ( PARTITION BY CAST(tstamp AS DATE) ORDER BY tstamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS first, last_value(volume) OVER ( PARTITION BY CAST(tstamp AS DATE) ORDER BY tstamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS last, first_value(volume) OVER ( PARTITION BY CAST(tstamp AS DATE) ORDER BY tstamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) - last_value(volume) OVER ( PARTITION BY CAST(tstamp AS DATE) ORDER BY tstamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) AS diff, CAST(tstamp AS DATE) AS date FROM t ORDER BY CAST(tstamp AS DATE) Oops That doesn’t look too readable. But it will yield the correct result. Granted, we could wrap the definition for the columns FIRST and LAST in a derived table, but that would still leave us with two repetitions of the window definition: PARTITION BY CAST(tstamp AS DATE) ORDER BY tstamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING WINDOW clause to the rescue Luckily, at least 3 databases have implemented the SQL standard WINDOW clause: MySQL PostgreSQL Sybase SQL Anywhere The above query can be refactored to this one: SELECT DISTINCT first_value(volume) OVER w AS first, last_value(volume) OVER w AS last, first_value(volume) OVER w - last_value(volume) OVER w AS diff, CAST(tstamp AS DATE) AS date FROM t WINDOW w AS ( PARTITION BY CAST(tstamp AS DATE) ORDER BY tstamp ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) ORDER BY CAST(tstamp AS DATE) Notice how I can specify a window name with a window specification in a similar way as I can define a common table expression (WITH clause): WINDOW AS () { , AS ()... } Not only can I reuse entire specifications, I could also build a specification from a partial specification, and reuse only parts. My previous query could have been rewritten as such: SELECT DISTINCT first_value(volume) OVER w3 AS first, last_value(volume) OVER w3 AS last, first_value(volume) OVER w3 - last_value(volume) OVER w3 AS diff, CAST(tstamp AS DATE) AS date FROM t WINDOW w1 AS (PARTITION BY CAST(tstamp AS DATE)), w2 AS (w1 ORDER BY tstamp), w3 AS (w2 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ORDER BY CAST(tstamp AS DATE) Each window specification can be created from scratch, or be based on a previously defined window specification. Note this is also true when referencing the window definition. If I wanted to reuse the PARTITION BY clause and the ORDER BY clause, but change the FRAME clause (ROWS ...), then I could have written this: SELECT DISTINCT first_value(volume) OVER ( w2 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS first, last_value(volume) OVER ( w2 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS last, first_value(volume) OVER ( w2 ROWS UNBOUNDED PRECEDING ) - last_value(volume) OVER ( w2 ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING ) AS diff, CAST(tstamp AS DATE) AS date FROM t WINDOW w1 AS (PARTITION BY CAST(tstamp AS DATE)), w2 AS (w1 ORDER BY tstamp) ORDER BY CAST(tstamp AS DATE) What if my database doesn’t support the WINDOW clause? In that case, you have to either manually write the window specification on each window function, or you use a SQL builder like jOOQ, which can emulate the window clause: So, jOOQ can translate this as well, now pic.twitter.com/ucbKNkhTjW — Lukas Eder (@lukaseder) July 12, 2018 You can try this translation online on our website: https://www.jooq.org/translate

X ITM Cloud News

Emily

Next Post

How to Use SQL UPDATE .. RETURNING to Run DML More Efficiently

Sun Nov 24 , 2019
Spread the love          At a customer site, I recently refactored a “slow-by-slow” PL/SQL loop and turned that into an efficient set based UPDATE statement saving many lines of code and running much faster. In this blog post, I will show how that can be done. The blog post will focus on […]
X- ITM

Cloud Computing – Consultancy – Development – Hosting – APIs – Legacy Systems

X-ITM Technology helps our customers across the entire enterprise technology stack with differentiated industry solutions. We modernize IT, optimize data architectures, and make everything secure, scalable and orchestrated across public, private and hybrid clouds.

This image has an empty alt attribute; its file name is x-itmdc.jpg

The enterprise technology stack includes ITO; Cloud and Security Services; Applications and Industry IP; Data, Analytics and Engineering Services; and Advisory.

Watch an animation of  X-ITM‘s Enterprise Technology Stack

We combine years of experience running mission-critical systems with the latest digital innovations to deliver better business outcomes and new levels of performance, competitiveness and experiences for our customers and their stakeholders.

X-ITM invests in three key drivers of growth: People, Customers and Operational Execution.

The company’s global scale, talent and innovation platforms serve 6,000 private and public-sector clients in 70 countries.

X-ITM’s extensive partner network helps drive collaboration and leverage technology independence. The company has established more than 200 industry-leading global Partner Network relationships, including 15 strategic partners: Amazon Web Services, AT&T, Dell Technologies, Google Cloud, HCL, HP, HPE, IBM, Micro Focus, Microsoft, Oracle, PwC, SAP, ServiceNow and VMware

.

X ITM