Kafka Zookeeper UNRESPONSIVE_FOLLOWER

A follower node is not responding in the Zookeeper ensemble.

Understanding Kafka Zookeeper

Apache Kafka is a distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. It is a critical component of Kafka, ensuring that the Kafka brokers are aware of each other and maintaining the state of the Kafka cluster.

Identifying the Symptom: Unresponsive Follower

When working with Kafka Zookeeper, you might encounter an issue where a follower node becomes unresponsive. This can manifest as lagging in data replication, errors in client requests, or even a complete halt in the Kafka cluster's operations. The error message might not always be explicit, but symptoms include increased latency and failed requests.

Details About the Unresponsive Follower Issue

The 'UNRESPONSIVE_FOLLOWER' issue occurs when one of the follower nodes in the Zookeeper ensemble stops responding. This can happen due to network issues, resource exhaustion, or hardware failures. Zookeeper operates as a quorum-based system, meaning that a majority of nodes must be operational for the system to function correctly. An unresponsive follower can disrupt this balance, leading to potential downtime or degraded performance.

Common Causes

  • Network connectivity issues between the follower and other nodes.
  • Insufficient resources (CPU, memory, disk) on the follower node.
  • Hardware failures or misconfigurations.

Steps to Resolve the Unresponsive Follower Issue

To resolve the issue of an unresponsive follower in Kafka Zookeeper, follow these steps:

Step 1: Verify Network Connectivity

Ensure that the follower node can communicate with the leader and other nodes in the ensemble. Use tools like ping and telnet to check connectivity:

ping [follower-node-ip]
telnet [follower-node-ip] [zookeeper-port]

Step 2: Check Resource Utilization

Log into the follower node and check CPU, memory, and disk usage. Use commands like top, free -m, and df -h to diagnose resource issues. If resources are exhausted, consider scaling up the node or redistributing the load.

Step 3: Inspect Zookeeper Logs

Review the Zookeeper logs located in the /var/log/zookeeper directory (or your configured log directory) for any error messages or warnings that might indicate the cause of the unresponsiveness.

Step 4: Restart the Follower Node

If the above steps do not resolve the issue, try restarting the Zookeeper service on the follower node:

sudo systemctl restart zookeeper

Or, if you are using a different service manager:

sudo service zookeeper restart

Additional Resources

For more information on managing and troubleshooting Kafka Zookeeper, consider visiting the following resources:

Never debug

Kafka Zookeeper

manually again

Let Dr. Droid create custom investigation plans for your infrastructure.

Start Free POC (15-min setup) →
Automate Debugging for
Kafka Zookeeper
See how Dr. Droid creates investigation plans for your infrastructure.

MORE ISSUES

Made with ❤️ in Bangalore & San Francisco 🏢

Doctor Droid