Distributed IT systems often have a need for sending messages between apps in a asynchronous and decoupled way. Apart from point-to-point communication there are a lot of situations where multicasting and broadcasting of messages comes into play. This is were Message Oriented Middleware (MOM), an application-neutral communication infrastructure, plays it's strengths. Other components in such a setup are often queues, Publish/Subscribe, and senders and receivers (messaging clients) of messages, that can access them, even after being offline.
Heiko Zeus started us off with his introduction to RabbitMQ. It's an implementation of the open Advanced Message Queuing Protocol (AMQP) written in Erlang. The AMQP concepts dictate a few components: exchanges, queues, bindings, publishers, and subscribers. Publishers post messages on exchanges. These exchanges pass the messages on to queues based on a set of rules called bindings. The messages on queues can then be accessed by the subscribers.
Heiko created a sample application, that implemented a soccer news ticker, to demonstrate RabbitMQ. He used the Ruby gem Bunny to create two publishers, a reporter, and a ticker with scores, and two subscribers (score service & push notifications).
The sample app Heiko created showed how fast—gems provided—RabbitMQ can be implemented into a distributed scenario. In general RabbitMQ was shown to be a well rounded tool, with clear concepts and versatile application possibilities.
The open-source software Apache Kafka was the topic of our second talk, which I presented. Apache Kafka was developed in Scala by LinkedIn to be a distributed messaging solution that, in contrast to RabbitMQ, was built for speed, i.e. high throughput and low latency, to service real-time data streams well. This is achieved by minimizing the logic on the broker/server side and through a few special implementation details. For example Apache Kafka doesn't utilize a systems RAM, it rather writes data directly to the server's filesystem. Since all data is written sequentially, a similar read and write speed to that of RAM can be achieved. Kafka also uses the zero-copy method to speed up the transfer of data. Logic, e.g. that handles which message ID the consumer (analogous to a RabbitMQ subscriber) last read, or the decision which partition to write incoming data to is all deferred to the clients (producer or consumer). Apart from the concepts of producers and consumers, Kafka also has topic, partitions, and replications. A topic describes the type of a message (e.g. user-clicks). Replication is used by Kafka to provide fault tolerance and partitioning is used to scale topics to multiple servers.
Apache Kafka has multiple applications. LinkedIn, for example, uses it as a unified data-pipeline for all events (user interactions, metrics, etc.) that get piped to data analysis tools (e.g. Hadoop, data warehouses, recommendation systems, etc.). Twitter uses Kafka to do real-time reports. The Kafka homepage has an overview of what companies are using Kafka for.
If you want to spare yourself the hassle of setting up and administrating Kafka, you might want to check out Amazon's Kinesis web service, it provides similar guaranties and functionalities to Apache Kafka.
After the two talks we had some discussions on the topic and also planned our next meeting, which will be on 2014-10-30 at 19:00 hours in our office at Zweitag (Am Alten Fischmarkt 12, 48143 Münster). Our next topic will be the "Go Programming Language", as well as another TBD. We'd like to thank all for coming and participating and hope to see you all again next time.
About Monster on Rails