Java 8 – Collection enhancements leveraging Lambda Expressions – or: How Java emulates SQL

1

Part of the evolution of Java in release 8 consists of Lambda expressions. These ‘functional expressions treated as variables’ introduce a powerful Inversion of Control in Java – allowing a clear and elegant distinction between the what [should be done]and how [should it be done]. The Collection APIs have been extended with the notion of Streams to make great use of these lambda expressions. This article shows some examples of what this means, leading up to the revelation that under certain circumstances Java is very similar to SQL.

imageA Stream is an interface. It is a “[…] potentially infinite sequence of elements. A stream is a consumable data structure. The elements of the stream are available for consumption by either iteration or an operation. Once consumed the elements are no longer available from the stream.” Any collection – as well as several other sources – can be exposed as a Stream. On such as stream, a number of operations – aka a pipeline – can be performed. These operations are either intermediate or terminal:

  • intermediate – such as map,  filter, sorted, distinct, limit, skip, substream, concat that produce a stream from a stream
  • terminal – such as forEach, reduce, collect, sum, max, count, matchAny, findFirst, allMatch, noneMatch, findAny that perform an operation on the stream that is their input, potentially producing an object that is not a Stream (could be a Boolean, a Map, an Array or a Collection or just a void)

image

Note: the resources section below provide a link to the Devoxx 2012 presentation by Maurice Naftalin  that explains the notion of Streams in Java 8 very well. I have used some screenshots from this presentation to visualize some of the points I try to make in this article – but I suggest you watch the original presentation by Maurice.

Some operations can be ‘short circuit’, such as limit, skip, substream, findFirst and matchAny ; that means that these methods cause the earlier intermediate methods to be processed only until the short-circuit method can be evaluated. Note that streams and the operations on streams are by default lazy: they are processed only when needed and as many times as is needed. A stream that undergoes multiple pipeline operations that eventually terminates with a findFirst for example will only processes as many elements through the pipeline as are required for find the first element satisfying the condition. If a collection of 100 elements is exposed as a stream on which several filter and map operations are performed terminating with a findFirst, then it may be the case that those filter and map operations are performed only on the very first element from the stream – rather than on all 100 elements.

The pipeline we can build for Streams in Java 8 are somewhat similar to what we can do in Linux and UNIX:

image

Take the output (stream) from one operation and use it as the input for the next, never actually materializing the intermediate results – that sort of sums up what we do in Java 8 as well.

One bonus feature offered by streams is the option to very easily have operations executed in parallel. Instead of having to program in terms of immutability, threads, the executor framework (thread pools), futures, callables and the fork-join framework, the developer simply indicates that the pipeline executed on the stream is a candidate for parallel execution. The JVM will then take it upon itself to execute the operations from forEach and similar methods automatically concurrently. Note: the developer can help the JVM by configuring a Spliterator (see http://download.java.net/jdk8/docs/api/java/util/Spliterator.html for details); the Spliterator implements the strategy for partitioning the elements from the stream over various parallel processing threads. This next picture – from the Maurice Naftalin presentation – illustrates the process that takes place behind the scene when the parallel() is introduced in a pipeline. 

image

Many of the operations available in the Stream interface take Lambda expressions as their input. I have introduced some of the most commonly used types of Lambda expressions – Predicate, Consumer, Function, BiFunction, Supplier- in this article: http://technology.amis.nl/2013/10/03/java-8-the-road-to-lambda-expressions/.

Pipeline operations and types of Lambda expressions

The Predicate is an expression that takes an object (from the Stream) as input and returns a Boolean (the result of some evaluation of the object). The Predicate is used in the filter operation (that removes elements from the pipeline if they do not make the Predicate evaluate to true) and also in terminal operations such as findFirst, matchAll, matchAny.

The map operation on the Stream takes an object from the stream as input and returns an object that may be of the same type – but can be of a very different type as well. This operation is the element transformation function. Persons can be mapped to Persons or to other objects such as Integers, Booleans or anything you fancy. The map operation takes a Function<T,R> where the T specifies the input element read from the stream and R specifies the type of the object produced by the Map. Function is a lambda expression, as described in the blog article .

The reduce operation in the Stream interface is somewhat similar to but in the end quite different from the map operation: the reduce operation takes two elements from the stream as input and returns a single element. This operation is used by the framework to create an aggregate value out of all remaining elements in the stream.

The role of sorted is not hard to guess: it orders the elements in the stream – producing a stream itself. The sorted takes a Comparator as its input, a Lambda expression that determines for the element from the stream on what value it should be compared.

To partition the elements in the stream, the groupingBy operation in the Collectors class can be used. It takes a Lambda expression as input, a Function that works on an element from the stream and returns whatever value should be partitioned on.

The forEach operation is a terminal operation: it consumes elements from the stream, performs an action on them and returns nothing. The buck stops there. This operation takes a Consumer as input – a Lambda expression that acts on an input element.

The limit and skip operations take a long as input and produce a stream (that contains elements of the same type as the input stream). These operations are used to restrict the number of elements from the stream, either by taking the first # elements (limit) or everything after the first # (skip).

Comparing Stream-pipelines with SQL statements

To get a feeling for what the pipelines we can create in Java 8 can do, I have created a number of examples where a SQL statement for processing data is used along with its counterpart Java streams pipeline.

The first example in SQL: find the sum of all the salaries of all the females.

select sum(salary)
from   people
where  gender = 'FEMALE'

And the equivalent in Java:

people.
  stream().
    filter(p -> p.getGender() == Person.Gender.FEMALE).
	   mapToInt(p -> p.getSalary()).
                sum();

here a filter and mapping are used and since the mapping is to an Integer, we can use the sum() convenience method instead of having to create a reduce operation that performs the adding together.

 

Using distinct and forEach

Prepare a list of all (distinct) first names of the males in the collection. In SQL this can be done using this statement:

select distinct(firstName)
from   people
where  gender = 'MALE'

The Java equivalent could look like this:

people.
  stream().filter(p -> p.getGender() == Person.Gender.MALE).
     map(p -> p.getFirstName()).
        distinct().
	    forEach(System.out::println);

Using sorted, map and forEach

Find all people under 40 [years of age]and list the first and last names in order of age

select firstName||' '||lastName
from   people
where  age < 40
order
by     age

and in Java this would be

Comparator<Person> byAge = Comparator.comparing(Person::getAge);
people.
  stream().
    filter(p -> p.getAge() < 40).
	  sorted(byAge).
	    map(p-> p.getFirstName()+" "+p.getLastName()).
		  forEach(System.out::println);

or, more compact:

people.
  stream().
    filter(p -> p.getAge() < 40).
	  sorted((e1, e2) -> e1.getAge() - e2.getAge()).
	    map(p-> p.getFirstName()+" "+p.getLastName()).
		  forEach(System.out::println);

Using filter, distinct, map and reduce

Prepare a concatenated list of the unique first names of all males in the collection. In SQL this is done for example like this:

select listagg( firstName, ',')
from   ( select distinct firstName
         from people
         where  gender = 'MALE'
	   )

The Java Streams equivalent:

people.
  stream().
    filter(p -> p.getGender() == Person.Gender.MALE).
	  map(p -> p.getFirstName()).
             distinct().
	        reduce("Distinct First Names ",(name1, name2) -> name1+", "+name2)

Using groupingBy

Find the combined age for all people, partitioned by gender. In SQL this would be programmed with this Select statement:

select gender
,      sum(age) as aggregated_age
from   people
group
by     gender

In Java 8 we could write something very similar, using the groupingBy operation:

people.
  stream().
    collect(Collectors.groupingBy(Person::getGender)).
       forEach((g,lp) -> {  System.out.println(" Aggregated age of "+g+" is "
	                                          + lp.stream().mapToInt(p-> p.getAge()).sum());});

Using filter and max

The the oldest geezer of the lost. In SQL this is a little convoluted:

select *
from   people
where  age = ( select max(age)
               from   people
			   where  gender ='MALE'
			 )
and    gender ='MALE'			 

In Java it is quite straightforward, using a simple filter and max operation:

people.
  stream().
    filter(p -> p.getGender() == Person.Gender.MALE)
	   max( (p1,p2) -> p1.getAge() - p2.getAge()).
	     get()

 

Note: people in these examples is a simple Collection that contains Person objects:

        List<Person> people = new ArrayList<Person>();

        people.add(new Person("Louise", "Smith", "Dallas", 16000, LocalDate.of(1976, Month.MARCH, 11), Person.Gender.FEMALE));
        people.add(new Person("Tobias", "Jellema", "Hagestein", 4, LocalDate.of(2000, Month.OCTOBER, 30), Person.Gender.MALE));
        people.add(new Person("Mike", "Smith", "Melbourne", 2000, LocalDate.of(1943, Month.NOVEMBER, 1), Person.Gender.MALE));
        people.add(new Person("Mike", "Weber", "Pretoria", 4411, LocalDate.of(1961, Month.DECEMBER, 5), Person.Gender.MALE));
        people.add(new Person("John", "Smith", "London", 9000, LocalDate.of(1975, Month.JANUARY, 11), Person.Gender.MALE));
        people.add(new Person("John", "Williams", "Dublin", 900, LocalDate.of(1985, Month.APRIL, 19), Person.Gender.MALE));
        people.add(new Person("Anna", "Kolokova", "Kiev", 6000, LocalDate.of(1983, Month.JULY, 14), Person.Gender.FEMALE));

        System.out.println(">>>>  List all employees");
        people.forEach(System.out::println);

The Person class is fairly straightforward - with a little use of the new Java 8 Data Time API:

package nl.amis.hrm;

import java.time.LocalDate;
import java.time.Period;
import java.time.temporal.ChronoUnit;

public class Person {

    public enum Gender {
   MALE, FEMALE
 }
    
    private String firstName;
    private String lastName;
    private String city;
    private Integer salary;
    private LocalDate dateOfBirth; // = LocalDate.of(2012, Month.MAY, 14) birthDate
    private Gender gender;

    public Gender getGender() {
        return gender;
    }

    public void setGender(Gender gender) {
        this.gender = gender;
    }
    
    @Override
    public String toString() {
        return "Person{" + "firstName=" + firstName + ", lastName=" + lastName + ", gender=" + gender + ", city=" + city + ", salary=" + salary + ", dateOfBirth=" + dateOfBirth + '}';
    }    
    
    public Person(String firstName, String lastName, String city, Integer salary, LocalDate dateOfBirth, Gender gender) {
        this.firstName = firstName;
        this.lastName = lastName;
        this.city = city;
        this.salary = salary;
        this.dateOfBirth = dateOfBirth;
        this.gender = gender;
    }

    
    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public void setLastName(String lastName) {
        this.lastName = lastName;
    }

    public String getCity() {
        return city;
    }

    public void setCity(String city) {
        this.city = city;
    }

    public Integer getAge() {
        LocalDate today = LocalDate.now();
        Period p = Period.between(this.dateOfBirth, today);
        long p2 = ChronoUnit.YEARS.between(this.dateOfBirth, today);
        return ((Long) p2).intValue();
    }

    public Integer getSalary() {
        return salary;
    }

    public void setSalary(Integer salary) {
        this.salary = salary;
    }

    public LocalDate getDateOfBirth() {
        return dateOfBirth;
    }

    public void setDateOfBirth(LocalDate dateOfBirth) {
        this.dateOfBirth = dateOfBirth;
    }

}

Resources

Beta Java documentation for Streams: http://download.java.net/jdk8/docs/api/java/util/stream/Stream.html

CoreServlets Tutorial on Streams in Java 8 :Part 1 http://www.java-programming.info/tutorial/pdf/java/java-8/Java-8-Streams-Part-1.pdf and Part 2 http://www.java-programming.info/tutorial/pdf/java/java-8/Java-8-Streams-Part-2.pdf

Devoxx 2012 presentation  - Closures and Collections - the World After Eight by Maurice Naftalin - http://devoxx.com/display/DV12/Closures+and+Collections+-+the+World+After+Eight

State of Collections, article in DZone by Brian Goetz - http://java.dzone.com/articles/jdk-8-state-collections

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

1 Comment

  1. Very clear and easy to understand. The best resource found on the web so far.

    Thank you so much for the information.

Leave a Reply