Kafka Comprehensive Tutorial – Part 1
Kafka Comprehensive Tutorial – Part 3
Show all

Kafka Comprehensive Tutorial – Part 2

Kafka Architecture and Its Fundamental Concepts

2. Kafka Architecture – Apache Kafka APIs

Apache Kafka Architecture has four core APIs, producer API, Consumer API, Streams API, and Connector API. Let’s discuss them one by one:

Kafka Architecture - Apache Kafka APIs

Kafka Architecture – Apache Kafka APIs

a. Producer API

In order to publish a stream of records to one or more Kafka topics, the Producer API allows an application. 

b. Consumer API

This API permits an application to subscribe to one or more topics and also to process the stream of records produced to them.

c. Streams API

Moreover, to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams, the streams API permits an application.

d. Connector API

While it comes to building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems, we use the Connector API. For example, a connector to a relational database might capture every change to a table.

3. Apache Kafka Architecture – Cluster

The below diagram shows the cluster diagram of Apache Kafka:

Kafka Architecture

Kafka Architecture – Kafka Cluster

a. Kafka Broker

Basically, to maintain load balance Kafka cluster typically consists of multiple brokers. However, these are stateless, hence for maintaining the cluster state they use ZooKeeper. Although, one Kafka Broker instance can handle hundreds of thousands of reads and writes per second. Whereas, without performance impact, each broker can handle TB of messages. In addition, make sure ZooKeeper performs Kafka broker leader election.

b. Kafka – ZooKeeper

For the purpose of managing and coordinating, Kafka broker uses ZooKeeper. Also, uses it to notify producer and consumer about the presence of any new broker in the Kafka system or failure of the broker in the Kafka system. As soon as Zookeeper send the notification regarding presence or failure of the broker then producer and consumer, take the decision and starts coordinating their task with some other broker.

c. Kafka Producers

Further, Producers in Kafka push data to brokers. Also, all the producers search it and automatically sends a message to that new broker, exactly when the new broker starts. However, keep in mind that the Kafka producer sends messages as fast as the broker can handle, it doesn’t wait for acknowledgments from the broker.

d. Kafka Consumers

Basically, by using partition offset the Kafka Consumer maintains that how many messages have been consumed because Kafka brokers are stateless. Moreover, you can assure that the consumer has consumed all prior messages once the consumer acknowledges a particular message offset. Also, in order to have a buffer of bytes ready to consume, the consumer issues an asynchronous pull request to the broker. Then simply by supplying an offset value, consumers can rewind or skip to any point in a partition. In addition, ZooKeeper notifies Consumer offset value.

4. Kafka Architecture – Fundamental Concepts

Here, we are listing some of the fundamental concepts of Kafka Architecture that you must know:

a. Kafka Topics

The topic is a logical channel to which producers publish message and from which the consumers receive messages.

  1. A topic defines the stream of a particular type/classification of data, in Kafka.
  2. Moreover, here messages are structured or organized. A particular type of messages is published on a particular topic.
  3. Basically, at first, a producer writes its messages to the topics. Then consumers read those messages from topics.
  4. In a Kafka cluster, a topic is identified by its name and must be unique.
  5. There can be any number of topics, there is no limitation.
  6. We can not change or update data, as soon as it gets published.

Below is the image which shows the relationship between Kafka Topics and Partitions:

Kafka Architecture

Kafka Architecture – Relation between Kafka Topics and Partitions

b. Partitions in Kafka

In a Kafka cluster, Topics are split into Partitions and also replicated across brokers.

  1. However, to which partition a published message will be written, there is no guarantee about that.
  2. Also, we can add a key to a message. Basically, we will get ensured that all these messages (with the same key) will end up in the same partition if a producer publishes a message with a key. Due to this feature, Kafka offers message sequencing guarantee. Though, unless a key is added to it, data is written to partitions randomly.
  3. Moreover, in one partition, messages are stored in the sequenced fashion.
  4. In a partition, each message is assigned an incremental id, also called offset.
  5. However, only within the partition, these offsets are meaningful. Moreover, in a topic, it does not have any value across partitions.
  6. There can be any number of Partitions, there is no limitation.

c. Topic Replication Factor in Kafka

While designing a Kafka system, it’s always a wise decision to factor in topic replication. As a result, its topics’ replicas from another broker can solve the crisis, if a broker goes down. For example, we have 3 brokers and 3 topics. Broker1 has Topic 1 and Partition 0, its replica is in Broker2, so on and so forth. It has got a replication factor of 2; it means it will have one additional copy other than the primary one. Below is the image of Topic Replication Factor:
Let’s learn Apache Kafka Streams | Stream Processing Topology

Kafka Architecture

Kafka Architecture – Topic Replication Factor

Some key points –

  1. Replication takes place in the partition level only.
  2. For a given partition, only one broker can be a leader, at a time. Meanwhile, other brokers will have in-sync replica; what we call ISR.
  3. It is not possible to have the number of replication factor more than the number of available brokers.

d. Consumer Group

  1. It can have multiple consumer process/instance running.
  2. Basically, one consumer group will have one unique group-id.
  3. Moreover, exactly one consumer instance reads the data from one partition in one consumer group, at the time of reading.
  4. Since, there is more than one consumer group, in that case, one instance from each of these groups can read from one single partition.
  5. However, there will be some inactive consumers, if the number of consumers exceeds the number of partitions. Let’s understand it with an example if there are 8 consumers and 6 partitions in a single consumer group, that means there will be 2 inactive consumers.

Apache Kafka Workflow | Kafka Pub-Sub Messaging

2. What is Kafka Workflow?

In Kafka Workflow, Kafka is the collection of topics which are separated into one or more partitions and partition is a sequence of messages, where index identifies each message (also we call an offset). However, in a Kafka cluster, all the data is the disjoint union of partitions. The incoming messages are present at the end of a partition, hence consumer can read them. Also, by replicating the messages to different brokers, it maintains durability.
In a very fast, reliable, persisted, fault-tolerance and zero downtime manner, Kafka offers a Pub-sub and queue-based messaging system. Moreover, producers send the message to a topic and the consumer can select any one of the message systems according to their wish.

3. Workflow of Pub-Sub Messaging

In Apache Kafka, the stepwise workflow of the Pub-Sub Messaging is:

  • At regular intervals, Kafka Producers send the message to a topic.
  • Kafka Brokers stores all messages in the partitions configured for that particular topic, ensuring equal distribution of messages between partitions. For example, Kafka will store one message in the first partition and the second message in the second partition if the producer sends two messages and there are two partitions.
  • Moreover, Kafka Consumer subscribes to a specific topic.
  • Once the consumer subscribes to a topic, Kafka offers the current offset of the topic to the consumer and save the offset in the Zookeeper ensemble.
  • Also, the consumer will request the Kafka in a regular interval, for new messages (like 100 Ms).
  • Kafka will forward the messages to the consumers as soon as received from producers.
  • The consumer will receive the message and process it.
  • Then Kafka broker receives an acknowledgment of the message processed.
  • Further, the offset is changed and updated to the new value as soon as Kafka receives an acknowledgment. Even during server outrages, the consumer can read the next message correctly, because ZooKeeper maintains the offsets.
  • However, until the consumer stops the request, the flow repeats.
  • As a benefit, the consumer can rewind/skip any offset of a topic at any time and also can read all the subsequent messages, as a par desire.

4. Workflow of Kafka Queue Messaging/Consumer Group

A group of Kafka consumers having the same Group ID can subscribe to a topic, instead of a single consumer, in a queue messaging system. However, with the same Group ID all consumers, those are subscribing to a topic are considered as a single group and share the messages. This system’s workflow is:

  • In regular intervals, Kafka Producers send the message to a Kafka topic.
  • As similar to the earlier scenario, here also Kafka stores all messages in the partitions configured for that particular topic.
  • Moreover, a single consumer in Kafka subscribes to a specific topic.
  • In the same way as Pub-Sub Messaging, Kafka interacts with the consumer until new consumer subscribes to the same topic.
  • As the new customers arrive, share mode starts in the operations and shares the data between two Kafka consumers. Moreover, until the number of Kafka consumers equals the number of partitions configured for that particular topic, the sharing repeats.
  • Although, the new consumer in Kafka will not receive any further message, once the number of Kafka consumers exceeds the number of partitions. It happens until any one of the existing consumer unsubscribes. This scenario arises because in Kafka there is a condition that each Kafka consumer will have a minimum of one partition and if no partition remains blank, then new consumers will have to wait.
  • In addition, we also call it Kafka Consumer Group. Hence, Apache Kafka will offer the best of both the systems in a very simple and efficient manner.

5. Role of ZooKeeper in Apache Kafka

Apache Zookeeper serves as the coordination interface between the Kafka brokers and consumers. Also, we can say it is a distributed configuration and synchronization service. Basically, ZooKeeper cluster shares the information with the Kafka servers. Moreover, Kafka stores basic metadata information in ZooKeeper Kafka, such as topics, brokers, consumer offsets (queue readers) and so on.
Follow the link to learn more about the Role of Zookeeper in Kafka
In addition, failure of Kafka Zookeeper/broker does not affect the Kafka cluster. It is because the critical information which is stored in the ZooKeeper is replicated across its ensembles. Then Kafka restores the state as ZooKeeper restarts, leading to zero downtime for Kafka. However, Zookeeper also performs leader election between the Kafka brokers, in the cases of leadership failure.
Hence, this was all about Apache Kafka Workflow. Hope you like our explanation.

Amir Masoud Sefidian
Amir Masoud Sefidian
Data Scientist, Machine Learning Engineer, Researcher, Software Developer

Comments are closed.