Lesser Known jOOλ Features: Useful Collectors

Spread the love
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

jOOλ is our second most popular library. It implements a set of useful extensions to the JDK’s Stream API, which are useful especially when streams are sequential only, which according to our assumptions is how most people use streams in Java.
Such extensions include:

// (1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, …)
Seq.of(1, 2, 3).cycle();

// tuple((1, 2, 3), (1, 2, 3))
Seq.of(1, 2, 3).duplicate();

// (1, 0, 2, 0, 3, 0, 4)
Seq.of(1, 2, 3, 4).intersperse(0);

// (4, 3, 2, 1)
Seq.of(1, 2, 3, 4).reverse();

… and many more.
Collectors
But that’s not the only thing jOOλ offers. It also ships with a set of useful Collectors, which can be used both with JDK streams, as well as with jOOλ’s Seq type. Most of them are available from the org.jooq.lambda.Agg type, where Agg stands for aggregations.
Just like the rest of jOOλ, these collectors are inspired by SQL, and you will find quite a few SQL aggregate functions represented in this class.
Here are some of these collectors:
Counting
While the JDK has Collectors.counting(), jOOλ also has a way to count distinct values, just like SQL:

// A simple wrapper for two values:
class A {
final String s;
final long l;
A(String s, long l) {
this.s = s;
this.l = l;
}

static A A(String s, long l) {
return new A(s, l);
}
}

@Test
public void testCount() {
assertEquals(7L, (long)
Stream.of(1, 2, 3, 3, 4, 4, 5)
.collect(Agg.count()));
assertEquals(5L, (long)
Stream.of(1, 2, 3, 3, 4, 4, 5)
.collect(Agg.countDistinct()));
assertEquals(5L, (long)
Stream.of(A(“a”, 1),
A(“b”, 2),
A(“c”, 3),
A(“d”, 3),
A(“e”, 4),
A(“f”, 4),
A(“g”, 5))
.collect(Agg.countDistinctBy(a -> a.l)));
assertEquals(7L, (long)
Stream.of(A(“a”, 1),
A(“b”, 2),
A(“c”, 3),
A(“d”, 3),
A(“e”, 4),
A(“f”, 4),
A(“g”, 5))
.collect(Agg.countDistinctBy(a -> a.s)));
}

These are pretty self explanatory, I think.
Percentiles
Just recently, I’ve blogged about the usefulness of SQL’s percentile functions, and how to emulate them if they’re unavailable.
Percentiles can also be nicely calculated on streams. Why not? As soon as a Stream’s contents implements Comparable, or if you supply your custom Comparator, percentiles are easy to calculate:

// Assuming a static import of Agg.percentile:
assertEquals(
Optional.empty(),
Stream. of().collect(percentile(0.25)));
assertEquals(
Optional.of(1),
Stream.of(1).collect(percentile(0.25)));
assertEquals(
Optional.of(1),
Stream.of(1, 2).collect(percentile(0.25)));
assertEquals(
Optional.of(1),
Stream.of(1, 2, 3).collect(percentile(0.25)));
assertEquals(
Optional.of(1),
Stream.of(1, 2, 3, 4).collect(percentile(0.25)));
assertEquals(
Optional.of(2),
Stream.of(1, 2, 3, 4, 10).collect(percentile(0.25)));
assertEquals(
Optional.of(2),
Stream.of(1, 2, 3, 4, 10, 9).collect(percentile(0.25)));
assertEquals(
Optional.of(2),
Stream.of(1, 2, 3, 4, 10, 9, 3).collect(percentile(0.25)));
assertEquals(
Optional.of(2),
Stream.of(1, 2, 3, 4, 10, 9, 3, 3).collect(percentile(0.25)));
assertEquals(
Optional.of(3),
Stream.of(1, 2, 3, 4, 10, 9, 3, 3, 20).collect(percentile(0.25)));
assertEquals(
Optional.of(3),
Stream.of(1, 2, 3, 4, 10, 9, 3, 3, 20, 21).collect(percentile(0.25)));
assertEquals(
Optional.of(3),
Stream.of(1, 2, 3, 4, 10, 9, 3, 3, 20, 21, 22).collect(percentile(0.25)));

Notice that jOOλ implements SQL’s percentile_disc semantics. Also, there are 3 “special” percentiles that deserve their own names:
0% – corresponds to the min() function
50% – corresponds to the median() function
100% – corresponds to the max() function
A variety of overloads allows for calculating:
The percentile of the values contained in the stream
The percentile of the values contained in the stream, if sorted by another value mapped by a function
The percentile of the values mapped to another value by a function
Mode
Speaking of statistics. What about the mode? I.e. the value that appears the most often in a stream? Easy, with Agg.mode()

assertEquals(
Optional.of(1),
Stream.of(1, 1, 1, 2, 3, 4).collect(Agg.mode()));
assertEquals(
Optional.of(1),
Stream.of(1, 1, 2, 2, 3, 4).collect(Agg.mode()));
assertEquals(
Optional.of(2),
Stream.of(1, 1, 2, 2, 2, 4).collect(Agg.mode()));

Other useful collectors
Other collectors that can be useful occasionally are:
Bitwise bitAnd() and bitOr() collectors that aggregate all numbers in a stream using bitwise operators.
commonPrefix() and commonSuffix() aggregators that find the common prefix or suffix in a stream of strings
allMatch(Predicate), anyMatch(Predicate), and noneMatch(Predicate) collectors, that mimick the behaviour of the corresponding Stream method, but as a useful boolean collector, which is similar to PostgreSQL’s EVERY() aggregate function.
rank(), denseRank(), and percentRank() hypothetical set functions, which calculate the hypothetical rank a value would have if it were in the stream
first() and last() functions, which will produce the first and the last value in the stream.
Combine the aggregations
And one last important feature when working with jOOλ is the capability of combining aggregations, just like in SQL. Following the examples above, I can easily calculate several percentiles in one go:

// Unfortunately, Java’s type inference might need
// a little help here
var percentiles =
Stream.of(1, 2, 3, 4, 10, 9, 3, 3).collect(
Tuple.collectors(
Agg.percentile(0.0),
Agg.percentile(0.25),
Agg.percentile(0.5),
Agg.percentile(0.75),
Agg.percentile(1.0)
)
);

System.out.println(percentiles);

The result being:

(Optional[1], Optional[2], Optional[3], Optional[4], Optional[10])

X ITM Cloud News

Emily

Next Post

How to Calculate a Cumulative Percentage in SQL

Sun Nov 24 , 2019
Spread the love          A fun report to write is to calculate a cumulative percentage. For example, when querying the Sakila database, we might want to calculate the percentage of our total revenue at any given date. The result might look like this: Notice the beautifully generated data. Or as raw data: […]
X- ITM

Cloud Computing – Consultancy – Development – Hosting – APIs – Legacy Systems

X-ITM Technology helps our customers across the entire enterprise technology stack with differentiated industry solutions. We modernize IT, optimize data architectures, and make everything secure, scalable and orchestrated across public, private and hybrid clouds.

This image has an empty alt attribute; its file name is x-itmdc.jpg

The enterprise technology stack includes ITO; Cloud and Security Services; Applications and Industry IP; Data, Analytics and Engineering Services; and Advisory.

Watch an animation of  X-ITM‘s Enterprise Technology Stack

We combine years of experience running mission-critical systems with the latest digital innovations to deliver better business outcomes and new levels of performance, competitiveness and experiences for our customers and their stakeholders.

X-ITM invests in three key drivers of growth: People, Customers and Operational Execution.

The company’s global scale, talent and innovation platforms serve 6,000 private and public-sector clients in 70 countries.

X-ITM’s extensive partner network helps drive collaboration and leverage technology independence. The company has established more than 200 industry-leading global Partner Network relationships, including 15 strategic partners: Amazon Web Services, AT&T, Dell Technologies, Google Cloud, HCL, HP, HPE, IBM, Micro Focus, Microsoft, Oracle, PwC, SAP, ServiceNow and VMware

.

X ITM