Java 8 introduced lambdas and streams, two tools that seem to offer many ways for improve our code, in particular
for what concerns conciseness and readability. There are a lot of samples around, focusing often on
over-simplified contexts. In this post I'm describing a small refactoring that I applied to a portion of code in a
project I'm developing. With its own limitations, I think it's an interesting real-world case to analyse. In this
post, I'm assuming that the reader knows the basic concepts and syntax of some Java 8 features, such as lamdas,
Streams and Optional<T>
.
First, let's give the relevant context. I have a MediaItem
interface, which represents e.g. an
audio file, which exposes a Metadata
interface:
public interface Metadata { @Nullable public <T> T get (@Nonnull Key<T> key); @Nonnull public Set<Map.Entry<Key<?>, ?>> getEntries(); … }
It's basically a type-safe Map
, that there is a set of pre-defined Key
items which
carry the type information of the value they're associated with. So there's no need to cast the values coming out
from the get()
method.
Now I have to extract metadata items and store them into a semantic repository. You don't need to know a lot about semantic repositories, but the fact that they contain statements, or triples, in the form subject-predicate-object. Subjects or objects, when they refer to an entity and not a literal value, are basically the identifiers of the related entities (which are URIs in the semantic world). That's all you really need to know.
Indeed, for the scope of this post, you could pretend that the semantic repository is just a plain relational
database in which I have to insert some SQL statements. The relevant point is that I'm going to insert them by
firing an asynchronous message in a publish-and-subscribe
architecture; a subscriber somewhere will be notified of the request and perform the insertion. For the sake of
performance, I'm not firing a message for each statement, but I'm grouping them together (this might also have to
do with transaction control: perhaps I want to process them as atomic batches). On this purpose, there's an AddStatementRequest
message with its fluent Builder
. The fluent nature of this object was a small problem with streams
that I needed to solve.
Here it is the version of the code at the beginning of the refactoring:
public void importMediaItemEmbeddedMetadata (final @Nonnull MediaItem mediaItem, final @Nonnull URI mediaItemUri) { log.info("importMediaItemEmbeddedMetadata({}, {})", mediaItem, mediaItemUri); final MediaItem.Metadata metadata = mediaItem.getMetadata(); final Integer trackNumber = metadata.get(MediaItem.Metadata.TRACK); final Integer sampleRate = metadata.get(MediaItem.Metadata.SAMPLE_RATE); final Integer bitRate = metadata.get(MediaItem.Metadata.BIT_RATE); final Duration duration = metadata.get(MediaItem.Metadata.DURATION); AddStatementsRequest.Builder builder = AddStatementsRequest.build(); if (sampleRate != null) { builder = builder.with(mediaItemUri, MO.SAMPLE_RATE, literalFor(sampleRate)); } if (bitRate != null) { builder = builder.with(mediaItemUri, MO.BITS_PER_SAMPLE, literalFor(bitRate)); } if (trackNumber != null) { builder = builder.with(mediaItemUri, MO.TRACK_NUMBER, literalFor(trackNumber)); } if (duration != null) { builder = builder.with(mediaItemUri, MO.DURATION, literalFor((float)duration.toMillis())); } messageBus.publish(builder.create()); }
I think it is quite readable. There are two immediate code smells: the sequence of if
statements,
and the fact that I'm dealing with null
s.
Let's start from the latter issue. Java 8 offers the Optional<T>
class which represents a
value that can be present or not. I can refactor Metadata
as follows:
public interface Metadata { @Nunnull public <T> Optional<T> get (@Nonnull Key<T> key); @Nonnull public Set<Map.Entry<Key<?>, ?>> getEntries(); … }
So get()
now is guaranteed to return a non null
value. The method can be rewritten as
follows:
public void importMediaItemEmbeddedMetadata (final @Nonnull MediaItem mediaItem, final @Nonnull URI mediaItemUri) { log.info("importMediaItemEmbeddedMetadata({}, {})", mediaItem, mediaItemUri); final MediaItem.Metadata metadata = mediaItem.getMetadata(); final Optional<Integer> trackNumber = metadata.get(MediaItem.Metadata.TRACK); final Optional<Integer> sampleRate = metadata.get(MediaItem.Metadata.SAMPLE_RATE); final Optional<Integer> bitRate = metadata.get(MediaItem.Metadata.BIT_RATE); final Optional<Duration> duration = metadata.get(MediaItem.Metadata.DURATION); AddStatementsRequest.Builder builder = AddStatementsRequest.build(); if (sampleRate.isPresent()) { builder = builder.with(mediaItemUri, MO.SAMPLE_RATE, literalFor(sampleRate.get())); } if (bitRate.isPresent()) { builder = builder.with(mediaItemUri, MO.BITS_PER_SAMPLE, literalFor(bitRate.get())); } if (trackNumber.isPresent()) { builder = builder.with(mediaItemUri, MO.TRACK_NUMBER, literalFor(trackNumber.get())); } if (duration.isPresent()) { builder = builder.with(mediaItemUri, MO.DURATION, literalFor((float)duration.get().toMillis())); } messageBus.publish(builder.create()); }
Honestly, this is an almost neglectable improvement. Sure, isPresent()
is marginally better
readable than != null
, but there's no much beef. The if
statements are still needed,
and while there can't be NPEs any longer, I can still get it wrong: if a presence check is wrong, sampleRate.get()
will throw NoSuchElementException
. So, I've just exchanged an exception with another. While there's
plenty of code samples like this, and flame wars about whether it's good or not, to me it's quite evident that
this usage of Optional
is basically meaningless.
Optional
offers a more functionally oriented method, such as map()
to apply a Function
. It would eliminate conditionals, but wouldn't solve an issue, described
below, with the fact that builder
is not final
. That's why I did not consider this
route.
Now, a fundamental point is that, if you want quality in your code then you need testing. But a
compiler of a statically-typed language could help in avoiding some blatant errors and, in the end, save some
time. By introducing Optional
it has not been the case.
Basically, the null
-related issue is secondary. The problem here is conditioned by the major
code-smell, which is the sequence of conditional statements. It's also an approach that badly scales: what if I
need to add mapping for another dozen of metadata items?
Let's think of a Java 8-oriented refactoring that will bring the code down to 4 lines. Well, I'm lying: the fundamental lines will be 4+4, plus some other stuff. But improving the quality of code doesn't (always) mean literally shrinking lines, rather having better lines.
First, we realize that, rather than having a sequence of special-cases, we need a collection of metadata mappings, and iterating over it. In Java 8 we have lambda expressions that allow code an iteration in a single line. For instance, something such as the following pseudo-code:
AddStatementsRequest.Builder builder = AddStatementsRequest.build(); mapping.stream().forEach(() -> builder = builder.with(… /* do something smart */)); messageBus.publish(builder.create());
Unfortunately, it can't work whatever smart the thing is conceived. The problem is that a lambda is a sort of
masked inner class, and the usual rule applies: if it accesses local variables declared outside its scope, they
must be final
. But a fluent Builder can't be referenced by a final
variable (it
prevents from writing builder = builder.with(...)
). While at first sight one might think: “ok,
these are the classic devil's details that jeopardise a could-be elegant approach, and I give up”, the
problem is that we're looking at a very limited number of features offered by Java 8. Functional often requires
that you re-think some details. For instance, a fluent Builder is a sort of 'collector' of things that are
incrementally added. But Java 8 streams offer their own version of a Collector
:
an object that 'terminates' a stream, collecting all the elements in it into a single object. Perhaps we can
replace our collector with their collector, and find that their collector fits
better?
I received some feedback about this passage. It was pointed out that the problem is not only related to the
limitation about lambdas and final
references pointing to immutable objects such as Builder
.
Indeed from the design point of view a good functional approach should avoid side-effects.
So our code becomes:
Collector<Statement, List<Statement>, AddStatementsRequest> collector = ...; AddStatementRequest request = mapping.stream(). … .collect(collector); messageBus.publish(request);
Even better, taking advantage of static methods, in a single line:
messageBus.publish(mapping.stream(). … .collect(toAddStatementsRequest()));
Now, what is the cost of writing a custom Collector
? Something like that:
@Nonnull public static Collector<Statement, List<Statement>, AddStatementsRequest> toAddStatementsRequest() { return new ArrayListCollectorSupport<Statement, AddStatementsRequest>() { @Override @Nonnull public Function<List<Statement>, AddStatementsRequest> finisher() { return statements -> new AddStatementsRequest(statements); } }; } public abstract class ArrayListCollectorSupport<COLLECTED_TYPE, COLLECTING_TYPE> implements Collector<COLLECTED_TYPE, List<COLLECTED_TYPE>, COLLECTING_TYPE> { @Override @Nonnull public Supplier<List<COLLECTED_TYPE>> supplier() { return ArrayList::new; } @Override @Nonnull public BiConsumer<List<COLLECTED_TYPE>, COLLECTED_TYPE> accumulator() { return List::add; } @Override @Nonnull public BinaryOperator<List<COLLECTED_TYPE>> combiner() { return (left, right) -> { left.addAll(right); return left; }; } @Override @Nonnull public Set<Characteristics> characteristics() { // Not CONCURRENT since ArrayList is not thread-safe return Collections.emptySet(); } }
Honestly, we're adding a lot of lines of code. But:
ArrayListCollectorSupport
is a very reusable class for many kinds of programmer-definedCollectors
. Actually, it was already in my kitchen-sink library of small useful things that I reuse everywhere.toAddStatementsRequest()
is very specific toAddStatementsRequest
, but I can reuse it a few times.
I think most people working with Java 8 know Collectors
up to a certain degree, since they offer
some facility methods to easily re-pack a Stream
into a List
, or format it to a String
with controllable formatting. But I see that only a few ones understand that they could write their own Collector
for their classes, and this kind of usage of the Java 8 APIs can be useful for achieving a real benefit.
Sometimes you have complex logics in the collection strategy that don't fit with a simply iterative approach.
For instance, within a Stream
you can filter()
item by item, but you cannot take a
decision based on the whole collection of items accumulated so far. In a earlier version of the code related to
this post it was possible to override statements with the same subject and predicate, because a certain piece of
medatada was first extracted by an audio file and later superseded by a better piece of information retrieved
from the web. This was implemented in a custom Collector
.
This requirement went away with other refactorings, so now the custom Collector
was removed. I'm
leaving it in this post just for didactic purposes - let's just recall that in some more complex cases it can
help us. Ignoring that the Java 8 Streams API is made not only of Stream
s might make you give up a
refactoring and preclude an interesting way to re-shape your code.
Now, let's focus on the stream. We can start with the whole set of metadata entries:
mediaItem.getMetadata().getEntries().stream()
and then apply a filter for picking only those that I want to store to my semantic repository:
filter(e -> MAPPER.containsKey(e.getKey())
Now the trickiest part. I have to 'translate' a Metadata
Key
in the corresponding Statement
.
A Statement
, I repeat, is a triple subject-predicate-object. In my case the subject is the
id (i.e. the URI) of the mediaItem
. This is a variable part for each invocation, so can't be a
responsiblity of the MAPPER
. The predicate and object are fixed for each metadata
key,
so they can be a responsibility of the MAPPER
. First, we need to introduce a Pair
class which encapsulates predicate and object:
@Immutable @RequiredArgsConstructor @ToString static class Pair { @Nonnull private final URI predicate; @Nonnull private final Value object; }
I'm taking advantage of Lombok for saving some plumbing code (constructor
and getters). At this point I can define the MAPPER
as a Map
from Key
to
a Function
transforming an Object
(the metadata item value) to a Pair
of
predicate and object:
private static final Map<Key<?>, Function<Object, Pair>> MAPPER = new HashMap<>(); static { MAPPER.put(Metadata.TRACK, v -> new Pair(MO.P_TRACK_NUMBER, literalFor((int)v))); MAPPER.put(Metadata.SAMPLE_RATE, v -> new Pair(MO.P_SAMPLE_RATE, literalFor((int)v))); MAPPER.put(Metadata.BIT_RATE, v -> new Pair(MO.P_BITS_PER_SAMPLE, literalFor((int)v))); MAPPER.put(Metadata.DURATION, v -> new Pair(MO.P_DURATION, literalFor((float)((Duration)v).toMillis()))); }
Note that here we are already losing something. While in the original code, with the “unrolled” conditional
statements, we were taking advantage of the type information brought by Key
, here we have lost it.
For this reason, we are forced to put explicit casts in the parameter of literalFor()
. I don't see
solutions, and the cause is the limited support of type information, not reified, in Java generics.
The final touch is to add to Pair
a method to create a Statement
with a given
subject:
@Nonnull public Statement createStatementWithSubject (final @Nonnull URI subject) { return ...createStatement(subject, predicate, object); }
How to use that stuff? By applying a mapping to the stream, in which we first get the relevant Function
inside the MAPPER
, then apply it to the metadata value, retrieve the Pair
and create
the Statement
:
stream().map(e -> MAPPER.get(e.getKey()).apply(e.getValue()).createStatementWithSubject(mediaItemUri))
Putting all together:
public void importTrackMetadata (final @Nonnull MediaItem mediaItem, final @Nonnull URI mediaItemUri) { log.debug("importTrackMetadata({}, {})", mediaItem, mediaItemUri); messageBus.publish(mediaItem.getMetadata().getEntries().stream() .filter(e -> MAPPER.containsKey(e.getKey())) .map(e -> MAPPER.get(e.getKey()).apply(e.getValue()).createStatementWithSubject(mediaItemUri)) .collect(toAddStatementsRequest())); }
So, let's recap. The original method had 15 lines (excluding logging, blanks and braces). The refactored method
has got 4, but part of the information has been moved to the further 4 lines that populate MAPPER
.
In the end, we have 8 lines: we saved 7 lines. But we had to introduce Pair
(4 lines, and a few one
saved by Lombok): 4 further lines. Roughly, the number of lines didn't change. Was it worth the effort?
First, the refactored code scales much better: for each new metadata item to map, we need to add one more line
(we can't do better than this!); in the original code we needed to add a whole new if
statement. Not
only it's made by 2 lines, but it's less readable. In the end, we split the original blob into
an imperative part (the one in the method, with the stream iteration), which will stay the same
forever. Then we have a purely declarative part, which is the code populating MAPPER
.
There's not much to test in it, but the correctness of the mapping. So we have split an original blob of
two responsibilities into two separate sections, each one with a single responsibility. We can
understand each section separately, focusing only on a few lines at a time.
On the other hand, I think that many would argue that they find the new solution more “complex” (whatever the word means...) than the original.
The most controversial line is clearly this:
.map(e -> MAPPER.get(e.getKey()).apply(e.getValue()).createStatementWithSubject(mediaItemUri))
Is there any further room for improvement? It can be refactored as:
.map(e -> MAPPER.forItem(e).createStatementWithSubject(mediaItemUri))
but with a non neglectable cost, as we need to create a subclass:
static class Mapper extends HashMap<Key<?>, Function<Object, Pair>> { @Nonnull public Pair forItem (final @Nonnull Map.Entry<Key<?>, ?> entry) { return get(entry.getKey()).apply(entry.getValue()); } }
We're not saving any line here, instead we're introducing 4 more lines. And they are not reusable. In the end I applied this refactoring as well, but I'm not sure it's worth while.
What conclusion to draw? This is a specific problem, but a real case one. Undoubtedly, Java 8 was useful for refactoring my code in a way that has better responsibility separation, readability and maintainability. In terms of lines of code, there's not really a real gain, even considering some of the new lines I wrote are at least partially reusable.
So, if we limit ourselves to the mere LoC count, we could be easily displeased. When taking a broader horizon in consideration, the advantage is evident.
There are other advantages, in addition to readability. Streams are parallelizable, which means that if I had a large numbers of items to map, the refactored code would be faster, by taking advantage of multiple cores.
PS. While I was writing this post, my code evolved further. In particular, I found that the semantic schema I was applying was not correct: some metadata items have to be related to a certain entity, while others to a different one. The current code is:
static class Mapper extends HashMap<Key<?>, Function<Object, Pair>> { @Nonnull public Pair forItem (final @Nonnull Map.Entry<Key<?>, ?> entry) { return get(entry.getKey()).apply(entry.getValue()); } } private static final Mapper SIGNAL_MAPPER = new Mapper(); private static final Mapper TRACK_MAPPER = new Mapper(); static { TRACK_MAPPER. put(Metadata.TRACK, v -> new Pair(MO.P_TRACK_NUMBER, literalFor((int)v))); SIGNAL_MAPPER.put(Metadata.SAMPLE_RATE, v -> new Pair(MO.P_SAMPLE_RATE, literalFor((int)v))); SIGNAL_MAPPER.put(Metadata.BIT_RATE, v -> new Pair(MO.P_BITS_PER_SAMPLE, literalFor((int)v))); SIGNAL_MAPPER.put(Metadata.DURATION, v -> new Pair(MO.P_DURATION, literalFor((float)((Duration)v).toMillis()))); } public void importAudioFileMetadata (final @Nonnull MediaItem mediaItem, final @Nonnull URI signalUri, final @Nonnull URI trackUri) { log.debug("importAudioFileMetadata({}, {}, {})", mediaItem, signalUri, trackUri); final Set<Map.Entry<Key<?>, ?> entries = mediaItem.getMetadata().getEntries(); messageBus.publish(entries.stream() .filter(e -> SIGNAL_MAPPER.containsKey(e.getKey())) .map(e -> SIGNAL_MAPPER.forItem(e).createStatementWithSubject(signalUri)) .collect(toAddStatementsRequest())); messageBus.publish(entries.stream() .filter(e -> TRACK_MAPPER.containsKey(e.getKey())) .map(e -> TRACK_MAPPER.forItem(e).createStatementWithSubject(trackUri)) .collect(toAddStatementsRequest())); }
The readability improvement due do the introduction of MAPPER
is now even more evident: the
original sequence of conditional statements would easily induce errors in mis-matching the metadata items with the
URIs of the two entities, while now we have two MAPPER
s with their own configuration.
I have to further stress the point, anyway, that the final warranty of quality can't be given by syntax, but I need testing. Actually I have an integration test running over a few thousands audio files of my personal collection that makes sure data is properly translated; I couldn't have applied this radical refactoring without being supported by this test that assured me I was not introducing regressions at each step.
The better syntax, however, might help in making fewer errors and spending less time to run tests, detect problems and fix them.