Kafka, Avro Serialization, and Schema Registry
2019-05-09
Understand Kafka Clusters, Kafka Consumer Failover, and Kafka Broker Failover with examples
2019-05-09
Show all

The Kafka Ecosystem: Kafka Core, Kafka Streams, Kafka Connect, Kafka REST Proxy, and the Schema Registry

3 mins read

The core of Kafka is the brokers, topics, logs, partitions, and clusters. The core also consists of related tools like MirrorMaker.

The Kafka ecosystem consists of Kafka Core, Kafka Streams, Kafka Connect, Kafka REST Proxy, and the Schema Registry.

Most of the additional pieces of the Kafka ecosystem come from Confluent and is not part of Apache.

Kafka Stream:

The Streams API transform, aggregate, and process records from a stream and produces derivative streams.

Kafka Connect:

The connector API to create reusable producers and consumers (e.g., stream of changes from DynamoDB).

The Kafka REST Proxy:

is used by producers and consumers over REST (HTTP).

The Schema Registry:

Manages schemas using Avro for Kafka records.

The Kafka MirrorMaker:

Replicate cluster data to another cluster.

Kafka Ecosystem: Diagram of Connect Source, Connect Sink, and Kafka Streams

Kafka Ecosystem: Diagram of Connect Source, Connect Sink, Kafka Streams

Kafka Connect Sources are sources of records. Kafka Connect Sinks are destinations for records. 

Kafka Ecosystem: Kafka REST Proxy and Confluent Schema Registry

Kafka Ecosystem: Kafka REST Proxy and Confluent Schema Registry

Kafka Streams – Kafka Streams for Stream Processing

The Kafka Stream API builds on core Kafka primitives and has a life of its own.

Kafka Streams enables real-time processing of streams.

Kafka Streams supports stream processors.

A stream processor takes continual streams of records from input topics, performs some processing, transformation, aggregation on input, and produces one or more output streams. For example, a video player application might take an input stream of events of videos watched, and videos paused, output a stream of user preferences and then gear new video recommendations based on recent user activity or aggregate activity of many users to see what new videos are hot.

Kafka Stream API solves hard problems with out-of-order records, aggregating across multiple streams, joining data from multiple streams, allowing for stateful computations, and more.

Kafka Ecosystem: Kafka Streams and Kafka Connect

Kafka Ecosystem: Kafka Streams and Kafka Connect

Kafka Ecosystem Review

What is Kafka Streams?

Kafka Streams enable real-time processing of streams. It can aggregate across multiple streams, joining data from multiple streams, allowing for stateful computations, and more.

What is Kafka Connect?

Kafka Connect is the connector API to create reusable producers and consumers (e.g., stream of changes from DynamoDB). Kafka Connect Sources are sources of records. Kafka Connect Sinks are a destination for records.

What is the Schema Registry?

The Schema Registry manages schemas using Avro for Kafka records.

What is Kafka Mirror Maker?

The Kafka MirrorMaker is used to replicate cluster data to another cluster.

When might you use Kafka REST Proxy?

The Kafka REST Proxy is used by producers and consumers over REST (HTTP). You could use it for easy integration of existing code bases.

Amir Masoud Sefidian
Amir Masoud Sefidian
Machine Learning Engineer

Comments are closed.