How to Write a Multiplication Aggregate Function in SQL

Spread the love
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

Everyone knows the SQL SUM() aggregate function (and many people also know its window function variant).
When querying the Sakila database, we can get the daily revenue (using PostgreSQL syntax):

WITH p AS (
SELECT
CAST (payment_date AS DATE) AS date,
amount
FROM payment
)
SELECT
date,
SUM (amount) AS daily_revenue,
SUM (SUM (amount)) OVER (ORDER BY date) AS cumulative_revenue
FROM p
GROUP BY date
ORDER BY date

The result will look something like this:

date |daily_revenue |cumulative_revenue
———–|————–|——————-
2005-05-24 |29.92 |29.92
2005-05-25 |573.63 |603.55
2005-05-26 |754.26 |1357.81
2005-05-27 |685.33 |2043.14
2005-05-28 |804.04 |2847.18
2005-05-29 |648.46 |3495.64
2005-05-30 |628.42 |4124.06
2005-05-31 |700.37 |4824.43
2005-06-14 |57.84 |4882.27

Doing the same with multiplication
This is already quite useful. Very occasionally, however, we do not need to aggregate multiple values in a sum (through addition), but in a product (through multiplication). I’ve just stumbled upon such a case on Stack Overflow, recently.
The question wanted to achieve the following result:

date factor accumulated
—————————————
1986-01-10 null 1000
1986-01-13 -0.026595745 973.4042548
1986-01-14 0.005464481 978.7234036
1986-01-15 -0.016304348 962.7659569
1986-01-16 0 962.7659569
1986-01-17 0 962.7659569
1986-01-20 0 962.7659569
1986-01-21 0.005524862 968.0851061
1986-01-22 -0.005494506 962.765957
1986-01-23 0 962.765957
1986-01-24 -0.005524862 957.4468078
1986-01-27 0.005555556 962.7659569
1986-01-28 0 962.7659569
1986-01-29 0 962.7659569
1986-01-30 0 962.7659569
1986-01-31 0.027624309 989.3617013
1986-02-03 0.016129032 1005.319148
1986-02-04 0.042328041 1047.872338
1986-02-05 0.04568528 1095.744679

If this were a Microsoft Excel spreadsheet, the ACCUMULATED column would simply start with 1000 and have the following formula in all other rows:

accumulated(i) = accumulated(i – 1) * (1 + factor)

In other words (values truncated for simplicity):

1000.0 = start
973.4 = 1000.0 * (1 – 0.026)
978.7 = 973.4 * (1 + 0.005)
962.7 = 978.7 * (1 – 0.016)
962.7 = 962.7 * (1 – 0.000)
962.7 = 962.7 * (1 – 0.000)
962.7 = 962.7 * (1 – 0.000)
968.0 = 962.7 * (1 + 0.005)

This is exciting because we’re not only requiring multiplicative aggregation, but even cumulative multiplicative aggregation. So, another window function.
But regrettably, SQL doesn’t offer a MUL() aggregate function, even if it were relatively simple to implement. We have two options:
Implementing a custom aggregate function (stay tuned for a future blog post)
Using a trick by summing logarithms, rather than multiplying operands directly
We’re implementing the latter for now. Check out this cool Wikipedia website about logarithmic identities, which we are going to blindly trust. In the middle of it, we have:

bx * by = bx + y
Which leads to:

logb(x * y) = logb(x) + logb(y)

How cool is that? And thus:

x * y = blogb(x) + logb(y)
So, we can define any multiplication in terms of a bunch of exponentiation to some base (say e) and logarithms to some base (say e). Or, in SQL:

x * y = EXP(LN(x) + LN(y))

Or, as an aggregate function:

MUL(x) = EXP(SUM(LN(x)))

Heh!
Our original problem can thus be solved very easily using this, as shown in my stack overflow answer:

SELECT
date,
factor,
EXP(SUM(LN(1000 * (1 + COALESCE(factor, 1))))
OVER (ORDER BY date)) AS accumulated
FROM t

And we get the nice result as previously shown. You may have to replace LN() by LOG() depending on your database.
Caveat: Negative numbers
Try running this:

SELECT LN(-1)

You’ll get:

SQL Error [2201E]: ERROR: cannot take logarithm of a negative number

Logarithms are defined only for strictly positive numbers, unless your database is capable of handling complex numbers as well. In case of which a single zero value would still break the aggregation.
But if your data set is defined to contain only strictly positive numbers, you’ll be fine – give or take some floating point rounding errors. Or, you’ll do some sign handling, which looks like this:

WITH v(i) AS (VALUES (-2), (-3), (-4))
SELECT
CASE
WHEN SUM (CASE WHEN i < 0 THEN -1 END) % 2 < 0 THEN -1 ELSE 1 END * EXP(SUM(LN(ABS(i)))) multiplication1 FROM v; WITH v(i) AS (VALUES (-2), (-3), (-4), (-5)) SELECT CASE WHEN SUM (CASE WHEN i < 0 THEN -1 END) % 2 < 0 THEN -1 ELSE 1 END * EXP(SUM(LN(ABS(i)))) multiplication2 FROM v; The above yielding multiplication1 -------------------- -23.999999999999993 multiplication2 ------------------- 119.99999999999997 Close enough. Caveat: Zero Try running this: SELECT LN(0) You’ll get: SQL Error [2201E]: ERROR: cannot take logarithm of zero Zero is different from negative numbers. A product that has a zero operand is always zero, so we should be able to handle this. We’ll do it in two steps: Exclude zero values from the actual aggregation that uses EXP() and LN() Add an additional CASE expression that checks if any of the operands is zero The first step might not be necessary depending on how your database optimiser executes the second step. WITH v(i) AS (VALUES (2), (3), (0)) SELECT CASE WHEN SUM (CASE WHEN i = 0 THEN 1 END) > 0
THEN 0
WHEN SUM (CASE WHEN i < 0 THEN -1 END) % 2 < 0 THEN -1 ELSE 1 END * EXP(SUM(LN(ABS(NULLIF(i, 0))))) multiplication FROM v; Extension: DISTINCT Calculating the product of all DISTINCT values requires to repeat the DISTINCT keyword in 2 out of the above 3 sums: WITH v(i) AS (VALUES (2), (3), (3)) SELECT CASE WHEN SUM (CASE WHEN i = 0 THEN 1 END) > 0
THEN 0
WHEN SUM (DISTINCT CASE WHEN i < 0 THEN -1 END) % 2 < 0 THEN -1 ELSE 1 END * EXP(SUM(DISTINCT LN(ABS(NULLIF(i, 0))))) multiplication FROM v; The result is now: multiplication | ---------------| 6 | Notice that the first SUM() that checks for the presence of NULL values doesn’t require a DISTINCT keyword, so we omit it to improve performance. Extension: Window functions Of course, if we are able to emulate a PRODUCT() aggregate function, we’d love to turn it into a window function as well. This can be done simply by transforming each individual SUM() into a window function: WITH v(i, j) AS ( VALUES (1, 2), (2, -3), (3, 4), (4, -5), (5, 0), (6, 0) ) SELECT i, j, CASE WHEN SUM (CASE WHEN j = 0 THEN 1 END) OVER (ORDER BY i) > 0
THEN 0
WHEN SUM (CASE WHEN j < 0 THEN -1 END) OVER (ORDER BY i) % 2 < 0 THEN -1 ELSE 1 END * EXP(SUM(LN(ABS(NULLIF(j, 0)))) OVER (ORDER BY i)) multiplication FROM v; The result is now: i |j |multiplication | --|---|--------------------| 1 | 2 |2 | 2 |-3 |-6 | 3 | 4 |-23.999999999999993 | 4 |-5 |119.99999999999997 | 5 | 0 |0 | 6 | 1 |0 | So cool! The cumulative product gets bigger and bigger until it hits he first zero, from then on it stays zero. jOOQ support jOOQ 3.12 will support this as well and emulate it correctly on all databases:https://github.com/jOOQ/jOOQ/issues/5939 A note on Oracle performance Do note that Oracle is very slow to calculate LN(number_type). It can be MUCH faster to calculate, instead LN(binary_double_type). An explicit type cast produced a 100x performance improvement in a simple test, documented here.

X ITM Cloud News

Emily

Next Post

How to Use SQL UPDATE .. RETURNING to Run DML More Efficiently

Sun Nov 24 , 2019
Spread the love          At a customer site, I recently refactored a “slow-by-slow” PL/SQL loop and turned that into an efficient set based UPDATE statement saving many lines of code and running much faster. In this blog post, I will show how that can be done. The blog post will focus on […]
X- ITM

Cloud Computing – Consultancy – Development – Hosting – APIs – Legacy Systems

X-ITM Technology helps our customers across the entire enterprise technology stack with differentiated industry solutions. We modernize IT, optimize data architectures, and make everything secure, scalable and orchestrated across public, private and hybrid clouds.

This image has an empty alt attribute; its file name is x-itmdc.jpg

The enterprise technology stack includes ITO; Cloud and Security Services; Applications and Industry IP; Data, Analytics and Engineering Services; and Advisory.

Watch an animation of  X-ITM‘s Enterprise Technology Stack

We combine years of experience running mission-critical systems with the latest digital innovations to deliver better business outcomes and new levels of performance, competitiveness and experiences for our customers and their stakeholders.

X-ITM invests in three key drivers of growth: People, Customers and Operational Execution.

The company’s global scale, talent and innovation platforms serve 6,000 private and public-sector clients in 70 countries.

X-ITM’s extensive partner network helps drive collaboration and leverage technology independence. The company has established more than 200 industry-leading global Partner Network relationships, including 15 strategic partners: Amazon Web Services, AT&T, Dell Technologies, Google Cloud, HCL, HP, HPE, IBM, Micro Focus, Microsoft, Oracle, PwC, SAP, ServiceNow and VMware

.

X ITM