Davor Josipovic Just another WordPress blog – rather tryout

30/08/2014

Transform any (binary) function to an aggregate in pure SQL

Filed under: Oracle — Tags: , — Davor @ 10:44

Few days back I needed an aggregate counterpart of a BITOR function. Unfortunately the Oracle database doesn’t have one.

So how do I make one? There are three options here:

  1. Write your own aggregate function. Here is one way to do it.
  2. Rewrite it in function of an other aggregate function. For example PRODUCT() can be rewritten as EXP(SUM(LN())) (cf. infra). But there is no obvious way for writing an aggregate BITOR() in function of existing Oracle functions.
  3. Simulate aggregation in pure SQL. If you for example have 4 elements {a, b, c, d}, you know that their BITOR aggregate is: BITOR(a,BITOR(b,BITOR(c,d))). Since SQL:1999 we have recursion in SQL. So why write an aggregate function if you can compute it within SQL?

This blogpost is all about the third option. I wanted to see (I) whether it is possible and (II) whether I can generalize it in a concise manner for all binary functions. I also couldn’t find any information about this on the Internet, so that is why I am writing this. It turns out (I) is true, and (II) also, albeit with complications. What I didn’t expect is very bad performance. So the solution below is only for educational use.

Aggregation with + operator

Let’s start with summation. Summation (+) is a binary operator/function that is easy to understand and easy to verify with the aggregate SUM() function.

Suppose this is your table:

CREATE TABLE t4 (
  gr NUMBER(10), -- group
  nr NUMBER(10)  -- number
);

Insert some values in it

INSERT INTO t4 VALUES (1,10);
INSERT INTO t4 VALUES (1,20);
INSERT INTO t4 VALUES (1,30);
INSERT INTO t4 VALUES (1,40);
INSERT INTO t4 VALUES (1,50);
INSERT INTO t4 VALUES (2,20);
INSERT INTO t4 VALUES (2,30);

What we have is this:

        GR         NR
---------- ----------
         1         10 
         1         20 
         1         30 
         1         40 
         1         50 
         2         20 
         2         30 

The way to sum-aggregate these numbers is by adding a new column which will compute the sum of the current number and the previous sum:

        GR         NR RECURSIVE_SUM
---------- ---------- -------------
         1         10            10 
         1         20            30 
         1         30            60 
         1         40           100 
         1         50           150 
         2         20            20 
         2         30            50 

Finally I just have to take the last step in each group: 150 for gr = 1 and 50 for gr = 2. The recursive query that does all the above and uses only the (+) operator is the following:

WITH sel AS (
  SELECT t4.nr, t4.gr, ROW_NUMBER() OVER (partition BY gr ORDER BY nr) AS rn
  FROM t4
), rec(gr, r_out, rn, nr) AS (
    SELECT gr, nr, rn, nr FROM sel WHERE rn = 1
    UNION ALL
    SELECT sel.gr, rec.r_out + sel.nr, rec.rn + 1, sel.nr FROM rec, sel WHERE sel.rn = rec.rn + 1 AND sel.gr = rec.gr
), gr AS (
  SELECT gr, COUNT(gr) AS MAX FROM sel GROUP BY gr 
)
SELECT gr.gr, r_out SUM FROM gr INNER JOIN rec ON (gr.max = rec.rn AND gr.gr = rec.gr) ORDER BY gr.gr;
        GR        SUM
---------- ----------
         1        150 
         2         50 

This is equivalent to:

SELECT SUM(nr) SUM FROM t4 GROUP BY gr ORDER BY gr;
        GR        SUM
---------- ----------
         1        150 
         2         50 

Aggregation with * operator

Now let’s try multiplication. Just like summation, a product (*) is a binary operator/function. Now, to simulate the PRODUCT() function with the binary * function, you only have to change the r_out field in the recursive query:

WITH sel AS (
  SELECT t4.nr, t4.gr, ROW_NUMBER() OVER (partition BY gr ORDER BY nr) AS rn
  FROM t4
), rec(gr, r_out, rn, nr) AS (
    SELECT gr, nr, rn, nr FROM sel WHERE rn = 1
    UNION ALL
    SELECT sel.gr, rec.r_out * sel.nr, rec.rn + 1, sel.nr FROM rec, sel WHERE sel.rn = rec.rn + 1 AND sel.gr = rec.gr
), gr AS (
  SELECT gr, COUNT(gr) AS MAX FROM sel GROUP BY gr 
)
SELECT gr.gr, r_out product FROM gr INNER JOIN rec ON (gr.max = rec.rn AND gr.gr = rec.gr) ORDER BY gr.gr;
        GR    PRODUCT
---------- ----------
         1   12000000 
         2        600 

We can verify the above outcome with the aggregate PRODUCT() function rewritten as EXP(SUM(LN())). With a little algebra you can figure out why the equation holds. Here is the statement:

SELECT gr, EXP(SUM(LN(nr))) AS product FROM t4 GROUP BY gr ORDER BY gr;
        GR    PRODUCT
---------- ----------
         1   12000000 
         2        600 

Aggregation with BITOR function

By now it should be clear how to adjust the recursive query to simulate aggregation for any binary function: We only have to adjust the r_out field. Now let’s try to simulate the aggregated BITOR function. Because there is no BITOR we can rewrite it as BITOR(x,y) = (x+y)-BITAND(x,y);

WITH sel AS (
  SELECT t4.nr, t4.gr, ROW_NUMBER() OVER (partition BY gr ORDER BY nr) AS rn
  FROM t4
), rec(gr, r_out, rn, nr) AS (
    SELECT gr, nr, rn, nr FROM sel WHERE rn = 1
    UNION ALL
    SELECT sel.gr, (rec.r_out + sel.nr) - BITAND(rec.r_out, sel.nr), rec.rn + 1, sel.nr FROM rec, sel WHERE sel.rn = rec.rn + 1 AND sel.gr = rec.gr
), gr AS (
  SELECT gr, COUNT(gr) AS MAX FROM sel GROUP BY gr 
)
SELECT gr.gr, r_out BITOR FROM gr INNER JOIN rec ON (gr.max = rec.rn AND gr.gr = rec.gr) ORDER BY gr.gr;
        GR      BITOR
---------- ----------
         1         62 
         2         30 

Note 1: all the above recursive code is very inefficient. Tables exceeding 100 values will have a large performance impact. The problem is the statement following the UNION ALL: it has to select the right value(s) for each recursive iteration step. When I find more time I’ll try to optimize it. In the meantime, here is a query to fill the table with some dummy values for testing:

INSERT INTO t4
SELECT round(dbms_random.value(1,5)), round(dbms_random.value(1,99)) FROM DUAL
CONNECT BY level <= 100;

Powered by WordPress