Java 8 Stream findFirst vs. findAny: Perfomance and Functional Differences
Overview
1. Introduction
Searching for a specific element in a stream of elements is common in Java applications.
The two terminal operations Java provides to search elements are the Stream.findFirst() and Stream.findAny() methods.
In this tutorial, we'll explore how to use the findAny() and findFirst() methods, their differences, and which is faster.
2. findAny() vs. findFirst()
Let's first look at the findAny() and findFirst() method signatures in the Stream class:
1Optional<T> findAny();
2Optional<T> findFirst();
The findAny() method terminates a stream and returns an Optional containing an element found. Otherwise, it returns an empty Optional.
The findFirst() does the same thing as findAny(), except for an implementation detail, which we'll check shortly.
2.1. Test Structure
To illustrate how findAny() and findFirst() methods work, having an object where we can compare the target element's memory address is useful. For that, let's define a Person class like below:
1public class Person {
2 public String name;
3
4 public Person(String name) {
5 this.name = name;
6 }
7}
Now, let's create the main structure of our unit tests:
1public class FindFirstFindAnyUnitTest {
2 private Stream<Person> testStream;
3 private final Person target = new Person("John");
4 private final Predicate<Person> johnFilter = j -> "John".equals(j.name);
5
6 @BeforeEach
7 public void setUp() {
8 testStream = Stream.of(new Person("Maria"), new Person("Jane"), target, new Person("John"));
9 }
10 @AfterEach
11 public void tearDown() {
12 testStream = Stream.empty();
13 }
14}
2.2. How to Use findFirst()
The findFirst() method always returns the first element in the same order it appears in the stream in parallel or non-parallel streams.
For instance, if we search for a Person named "John", it should always return the first one added in the testStream:
1@Test
2public void givenJohnFilterFindFirst_whenSearchingAnyStream_thenReturnFirstJohn() {
3 var parallelMatch = testStream
4 .parallel()
5 .filter(johnFilter)
6 .findFirst()
7 .get();
8
9 var nonParallelMatch = Stream.of(new Person("Maria"), new Person("Jane"), target, new Person("John"))
10 .filter(johnFilter)
11 .findFirst()
12 .get();
13
14 assertEquals(target.hashCode(), parallelMatch.hashCode());
15 assertEquals(target.hashCode(), nonParallelMatch.hashCode());
16}
Both findFirst() calls return the target object in the parallel and non-parallel stream. The hashCode() value comparison using assertEquals proves that.
Even though the stream contains two Person objects named John, findFirst() always returns the first in the order they appear in the stream.
2.3. How to Use findAny()
The findAny() returns any element in a non-deterministic way. In other words, it might return any element independent of the order it appears in the stream**.
Let's try an example of searching in a non-parallel stream using findAny():
1@Test
2public void givenJohnFilterFindAny_whenSearchingNonParallelStream_thenReturnAnyJohn() {
3 var match = testStream
4 .filter(johnFilter)
5 .findAny();
6
7 assertTrue(match.isPresent());
8}
findAny() guarantees that any Person named John is found in whichever order they appear. In non-parallel streams, it is likely that findAny() returns the first element, but that's not guaranteed.
2.4. Functional Differences
As we've seen in previous sections, both findFirst() and findAny() return an Optional containing the element found in a stream.
The only difference between the two methods is how they implement the search: findAny() might retrieve any stream element in a non-deterministic way. In contrast, findFirst() will always return the first element that appears in the stream.
2.5. Which is Faster: findFirst vs. findAny?
Let's compare the performance of non-parallel and parallel versions of findAny() and findFirst() methods using the configuration below:
- Stream consisting of 5 million elements different from the target.
- Two target objects are added randomly to the stream.
- We measure milliseconds using the Instant.now and Duration.between methods from java.time package.
- Each time corresponds to an individual run time after 20 runs using the same input.
The table below shows the value for each execution and the results of mean, median, and min:
Run # | median | mean | min | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
non-parallel stream | |||||||||||||||||||||||
findFirst | 79.5ms | 80.65ms | 13ms | 208ms | 77ms | 82ms | 162ms | 23ms | 85ms | 19ms | 5ms | 179ms | 59ms | 35ms | 83ms | 18ms | 86ms | 142ms | 37ms | 56ms | 119ms | 6ms | 132ms |
findAny | 33.5ms | 49.65ms | 5ms | 103ms | 80ms | 25ms | 127ms | 18ms | 5ms | 98ms | 5ms | 18ms | 188ms | 59ms | 58ms | 5ms | 22ms | 26ms | 6ms | 39ms | 44ms | 31ms | 36ms |
parallel stream | |||||||||||||||||||||||
findFirst | 22.5ms | 22.5ms | 5ms | 30ms | 26ms | 16ms | 30ms | 15ms | 20ms | 23ms | 21ms | 28ms | 28ms | 15ms | 22ms | 20ms | 18ms | 26ms | 13ms | 16ms | 30ms | 29ms | 24ms |
findAny | 17.5ms | 21.4ms | 5ms | 105ms | 8ms | 27ms | 13ms | 18ms | 29ms | 9ms | 30ms | 7ms | 14ms | 13ms | 19ms | 5ms | 23ms | 20ms | 7ms | 16ms | 17ms | 24ms | 24ms |
Which is faster findFirst or findAny?
Regarding findAny() vs. findFirst() and in Java Streams performance:
- Statistically, findAny() is faster than findFirst() in any scenario. Thus, if there's no requirement to get the first element of the stream, opt for the findAny() method.
- Opt for parallel streams for big datasets whenever possible since they are faster.
- Evaluate if the overhead time of parallel streams is worth using them. For small datasets, it might not.
3. Conclusion
In this post, we've investigated the differences between the findFirst() and findAny() methods to search for elements in streams.
findAny() returns any element in the stream, whereas findFirst() always picks the first.
If the requirement is to get any element from the stream, not precisely the first one, always choose the findAny() method since it's faster. Use parallel streams for big datasets whenever possible.