Akka Streams Kafka 0.11

Apache Kafka is the leading distributed messaging system, and Reactive Streams is an emerging standard for asynchronous stream processing. It seems natural to combine these two; that’s why SoftwareMill started the reactive-kafka project back in December 2014 and maintained it since.

Recently we’ve combined efforts with the Akka team, and now the project can leverage the expert input from the Lightbend Akka maintainers themselves (Patrik Nordwall and Endre Varga), community (Alexey Romanchuk) and the original reactive-kafka authors (Krzysiek Ciesielski from SoftwareMill).

As a result of this collaboration, we’re pleased to announce the 0.11 release of the project, which brings a lot of updates! Integrating Kafka into your reactive data processing pipeline is now even easier than before.

First of all, we have a redesigned API, which should be in-line with other reactive-* connectors (see also the Alpakka initiative), as well as provide additional flexibility. As an example of a common task, to read data from a Kafka topic, process it and commit offsets in batches collected within 10 seconds or at most 30 messages, you can define your stream as follows:

val consumerSettings = ConsumerSettings(system, 
  new ByteArrayDeserializer, new StringDeserializer)
  .withBootstrapServers("localhost:9092")
  .withGroupId("group1")
  .withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")

val result = Consumer
  .committableSource(consumerSettings, Subscriptions.topics("topic1"))
  .mapAsync(3)(processMsg)
  .groupedWithin(30, 10.seconds) // commit every 10 seconds or 30 msgs
  .map(group => group.foldLeft(CommittableOffsetBatch.empty) { 
    (batch, elem) => batch.updated(elem) })
  .mapAsync(3)(_.commitScaladsl())
  .runWith(Sink.ignore)

result.onFailure {
  case e: Throwable => handleError(e)
}

More examples in Java and Scala, with explanations, can be found in the documentation.

Secondly, there are significant performance improvements. We’ve done some benchmarks, and while there is still some overhead from the reactive-kafka wrapper (though keep in mind, that you also get something in return: e.g. all of the benefits coming from a nice API with asynchronous backpressure!), the overall numbers look very good and are getting better.

Here’s a couple of common Kafka usage scenarios, comparing the old reactive-kafka version (M4), the current version (0.11), and equivalent functionality implemented using plain Kafka Producers/Consumers (but not taking into account, of course, connecting with any other reactive components).

Plain consumer

The first scenario represents a consumer which reads messages from Kafka and pushes it through a non-blocking processing stage, without commit. The reactive setup (akka-plain-consumer) gets very close (80%) to full speed achieved with a while loop pulling elements from a consumer (plain-consumer).

Consumer with batched commit

Another scenario represents a very common use case - consuming messages and committing in batches in order to achieve at-least once delivery. Performance improvements in 0.11 allowed to gain a massive throughput increase from ~17k to ~450k msgs/s.

At most once consumer

Sometimes one needs to commit each single message before processing, which gives at most once delivery guarantee. 0.11 optimizations fixed the super-slow value of 20 msgs/s in the previous implementation, and reached over 2800 msgs/s. This is a great step forward, very close to the simple while loop scenario (“at-most-once”).

Producer

Our last benchmark tests a producer continuously sending messages to a Kafka topic. Current version of akka-stream-kafka writes over 85,000 messages per second. This result has been achieved with producer parallelism set to 100, which indicates how many parallel writes can be waiting for confirmation until this stage backpressures. The “plain-producer” test chosen for comparison writes to Kafka in a loop being bound only by its internal buffer size.

Give it a try, we’d love to hear your feedback!

-- Adam Warski & Krzysiek Ciesielski
September 10 2016


Past Blog posts
Jan 17 2017
{% include JB/setup %} # Community Survey 2016 summary While it took us a while to go over the 637 replies replies from the 2016 community survey, now we’re ready...
Dec 05 2016
{% include JB/setup %} We are using [Aeron](https://github.com/real-logic/Aeron) as the underlying transport in the new remoting implementation for Actor messages. The Aeron transport is based on UDP but it provides...
Dec 02 2016
{% include JB/setup %} The new remoting implementation for actor messages was released in Akka 2.4.11 two months ago. Artery is the code name for it. It’s a drop-in replacement...
Oct 21 2016
{% include JB/setup %} In Mastering GraphStages [Part I](http://blog.akka.io/streams/2016/07/30/mastering-graph-stage-part-1) and [Part II](http://blog.akka.io/integrations/2016/08/25/simple-sink-source-with-graphstage) we have seen that push and pull are the primary methods to use when implementing a `GraphStage`. In...
Sep 23 2016
{% include JB/setup %} In [part I](http://blog.akka.io/integrations/2016/09/16/custom-flows-parsing-xml-part-1) we have built an XML parser that reads from a streamed data source and emits streamed XML parsing events as its output. This...
Sep 16 2016
{% include JB/setup %} In previous posts we have shown how to build basic [Sinks and Sources](http://blog.akka.io/integrations/2016/08/25/simple-sink-source-with-graphstage), and how to [integrate with existing APIs](http://blog.akka.io/integrations/2016/08/29/connecting-existing-apis) and handle [backpressure in various ways](http://blog.akka.io/integrations/2016/09/05/flow-control-at-the-akka-stream-boundary)....
Sep 10 2016
Apache Kafka is the leading distributed messaging system, and Reactive Streams is an emerging standard for asynchronous stream processing. It seems natural to combine these two; that’s why SoftwareMill started...
Sep 05 2016
When working with Akka Streams, one can be assured that all of the data is going to be processed in bounded memory. The reason Akka Streams can guarantee this, is...