Quotas and Throttling
Kafka is often used as a central nervous system shared by hundreds of teams. A single misconfigured application (the “Noisy Neighbor”) could accidentally flood the cluster with gigabytes of data, starving critical production apps of network bandwidth or CPU.
To prevent this, Kafka provides Quotas.
1. Types of Quotas
1.1 Network Bandwidth Quotas
Limits the byte rate (bytes/sec) a client can produce or consume.
producer_byte_rate: Limits produce traffic.consumer_byte_rate: Limits fetch traffic.
1.2 Request Rate Quotas
Limits the percentage of time the broker’s request handler threads spend processing requests from a client.
request_percentage: Prevents CPU exhaustion.
2. Interactive: The Throttler
Simulate a producer sending data. Adjust the quota to see how Kafka throttles the connection by inserting artificial delays (latency) into the response.
3. How Throttling Works
Unlike a firewall that drops packets, Kafka delays the response.
If you produce 10MB/s but your quota is 5MB/s, Kafka calculates how much delay is needed to bring your average down to 5MB/s. It then holds the response (ProduceResponse) for that duration before sending it back.
[!TIP] This is a polite way to throttle. The client naturally slows down because it is waiting for the acknowledgement.
4. Configuring Quotas
Quotas are defined per (User, ClientId).
4.1 CLI Command
The easiest way to set quotas is via the kafka-configs CLI.
# Limit user 'app-1' to 5MB/s produce rate
kafka-configs.sh --bootstrap-server localhost:9092 \
--alter --add-config 'producer_byte_rate=5242880' \
--entity-type users --entity-name app-1
4.2 Detecting Throttling in Code (Java)
When a client is throttled, the broker includes a throttleTimeMs in the response.
// In a Callback
producer.send(record, (metadata, exception) -> {
if (metadata != null) {
// Unfortunately, ProducerRecord metadata doesn't expose throttleTime directly in standard API
// But monitoring libraries use JMX metrics:
// kafka.producer:type=producer-metrics,client-id=...,name=produce-throttle-time-avg
}
});
Actually, the Consumer API is more explicit:
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
// Check if we were throttled during this poll
// (Note: JMX is the primary way to monitor this in production)
5. Summary
- Quotas protect the cluster from overload.
- Throttling is implemented by delaying responses, forcing clients to slow down.
- Configure quotas using
kafka-configs.shfor users or client IDs.