Kafka in Production: A Practical Guide for Trading Systems

Introduction

Kafka is the backbone of most modern trading data pipelines. This guide covers everything you need to know to design, configure, and operate Kafka reliably at scale — from producer idempotency to consumer group rebalancing to scaling strategies.

1. Cluster Setup

Never run a single broker in production. The minimum viable setup is 3 brokers with a KRaft quorum.

replication.factor    = 3   # each partition stored on 3 brokers
min.insync.replicas   = 2   # at least 2 must confirm writes

This combination means you can lose 1 broker and still accept writes and serve reads. With min.insync.replicas=2 and replication.factor=3, producers using acks=all will still get confirmation as long as 2 of 3 brokers are alive.

KRaft (no ZooKeeper) requires an odd number of controller nodes for majority voting — 3, 5, or 7. In large clusters, separate controller and broker roles to avoid resource contention.

2. Topic Design — Plan Partitions Upfront

The most important rule in Kafka operations: over-partition at creation. You can increase partitions later, but it changes key routing (more on this in the scaling section) and you can never reduce partitions. Design for 10-20x your current expected volume.

partitions = max_expected_consumers × 2-3

Partition key strategy

The partition key determines which partition a message lands on:

hash(key) % num_partitions = partition number

Same key → always same partition → ordering guaranteed per key. For trading systems:

Trades: key by symbol — all BTCUSD trades go to the same partition, preserving time ordering per symbol
Orders: key by order_id — all events for the same order are ordered
Orderbook: key by symbol — latest snapshot per symbol stays ordered

Topic design by data type

Topic	Partitions	Retention	Cleanup Policy
`orders`	12-24	7 days+	delete
`market-trades`	24-48	24 hours	delete
`orderbook`	12-24	1 hour	compact
`audit`	6-12	Years	delete

Log compaction (cleanup.policy=compact) is useful for orderbook snapshots — the broker keeps only the latest message per key, discarding older snapshots for the same symbol.

3. Producer Configuration

Core reliability settings

acks=all
enable.idempotence=true
retries=2147483647
max.in.flight.requests.per.connection=5

acks=all — the broker leader waits for all in-sync replicas to confirm before ACKing the producer. Combined with min.insync.replicas=2, this ensures the message is on at least 2 brokers before you get a success response.

enable.idempotence=true — the broker assigns each producer a PID (Producer ID) and tracks a sequence number per partition. If a retry arrives with the same sequence number, the broker discards it. This prevents duplicates caused by retried sends within a single producer session.

Note: idempotence is scoped to one producer instance. If the producer process restarts, it gets a new PID and the broker treats it as a new producer.

Throughput settings

linger.ms=5
batch.size=16384
compression.type=lz4

linger.ms controls how long the producer waits to accumulate messages into a batch before sending. The tradeoff: lower = lower latency, higher = larger batches, higher throughput. For market data, 5-20ms is typical.

compression.type=lz4 — compresses batches before sending. LZ4 is fast and reduces network and disk I/O significantly for JSON payloads. Worth enabling for any high-volume topic.

Transactions — exactly-once Kafka-to-Kafka

props.put(TRANSACTIONAL_ID_CONFIG, "producer-1");  // stable across restarts

producer.initTransactions();
try {
    producer.beginTransaction();
    producer.send(new ProducerRecord<>("output-topic", key, value));
    consumer.commitSync(offsets);  // atomically commit input offset
    producer.commitTransaction();
} catch (Exception e) {
    producer.abortTransaction();
}

Transactions wrap producing to an output topic and committing the input offset atomically. Either both happen or neither does. The TRANSACTIONAL_ID_CONFIG is stable across producer restarts — the broker uses it to fence zombie producers and deduplicate replayed transactions.

Important: Kafka transactions only work when the output is also a Kafka topic. They do not extend to external systems like PostgreSQL.

4. Consumer Configuration

Offset management

The most critical decision is when to commit offsets.

props.put("enable.auto.commit", "false");  // always disable for production

At-most-once (commit before processing — avoid for financial data):

consumer.commitSync();
processRecord(record);
// if crash here: message skipped → data loss

At-least-once (commit after processing — standard):

processRecord(record);
consumer.commitSync();
// if crash here: message replayed → duplicate (handled by idempotent sink)

Shutdown pattern — always flush on shutdown:

try {
    while (running) {
        var records = consumer.poll(Duration.ofMillis(100));
        for (var record : records) {
            processRecord(record);
        }
        consumer.commitAsync();  // fast path during normal operation
    }
} finally {
    consumer.commitSync();  // ensure final commit before exit
    consumer.close();
}

Rebalance listener — commit before losing partition ownership:

consumer.subscribe(topics, new ConsumerRebalanceListener() {
    public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
        consumer.commitSync();  // commit before partition is reassigned
    }
    public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
        // resume from last committed offset on newly assigned partitions
    }
});

Performance settings

max.poll.records=500
max.poll.interval.ms=300000
session.timeout.ms=45000
heartbeat.interval.ms=3000
fetch.min.bytes=1
fetch.max.wait.ms=500

max.poll.interval.ms is a common production pitfall. If your processing takes longer than this interval between poll() calls, Kafka assumes the consumer is dead and triggers a rebalance. The fix is to either reduce max.poll.records, increase the interval, or move slow processing to a separate thread.

max.poll.records=500
processing_time_per_record=500ms
total_batch_time=250s  → must be < max.poll.interval.ms (300s)

# If processing slows to 700ms/record:
total_batch_time=350s  → exceeds 300s → kicked out of group → rebalance

Consumer group assignment strategy

partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor

The default RangeAssignor stops all consumers during rebalance (stop-the-world). CooperativeStickyAssignor only moves partitions that actually need to move — existing assignments continue processing uninterrupted. For trading systems where any pause causes lag spikes, this is essential.

5. Delivery Guarantees

The three guarantees

Guarantee	Risk	Mechanism
At-most-once	Data loss	Commit before processing
At-least-once	Duplicates	Commit after processing
Exactly-once	Neither	Kafka transactions (Kafka→Kafka only)

The production standard: effectively exactly-once

True exactly-once using Kafka transactions adds significant complexity and overhead, and only works when the output stays within Kafka. Most production systems use a simpler approach:

At-least-once delivery  +  Idempotent sink  =  Effectively exactly-once

The sink handles deduplication using a natural unique key:

-- Trades: deduplicate by exchange-assigned trade ID
INSERT INTO trades (trade_id, symbol, source, ...)
VALUES (?, ?, ?, ...)
ON CONFLICT (symbol, source, trade_id) DO NOTHING;

-- OHLCV bars: deduplicate by window
INSERT INTO ohlcv (symbol, source, window_start, ...)
VALUES (?, ?, ?, ...)
ON CONFLICT (symbol, source, window_start) DO UPDATE SET ...;

-- Orders: deduplicate by client-generated order ID
INSERT INTO orders (order_id, ...)
VALUES (?, ...)
ON CONFLICT (order_id) DO NOTHING;

This approach works with any sink (PostgreSQL, TimescaleDB, S3, Cassandra) and has no performance overhead on the Kafka layer.

Unique ID strategy

For the idempotent sink pattern to work, every message needs a stable unique ID that survives retries:

Exchange trades: use the exchange-assigned trade ID (e.g. Binance t field)
OHLCV windows: use (symbol, source, window_start) — deterministic from the window definition
Client orders: generate a UUID before the first network call, persist it locally, then send

The critical rule: generate the ID before the Kafka send, not after. A retry must carry the same ID as the original attempt.

6. Scaling Strategy

Adding brokers

Kafka does not automatically rebalance data to new brokers. After adding brokers, use the partition reassignment tool:

# Step 1: Generate reassignment plan
kafka-reassign-partitions --generate \
  --topics-to-move-json-file topics.json \
  --broker-list "1,2,3,4,5,6" \
  --bootstrap-server kafka:9092 > reassignment.json

# Step 2: Execute (runs in background, no downtime)
kafka-reassign-partitions --execute \
  --reassignment-json-file reassignment.json \
  --bootstrap-server kafka:9092

# Step 3: Verify
kafka-reassign-partitions --verify \
  --reassignment-json-file reassignment.json \
  --bootstrap-server kafka:9092

More brokers = distributed disk I/O = real throughput increase. More partitions on the same broker = more consumer parallelism but no producer throughput gain (same disk).

Increasing partition count

kafka-topics --alter --topic market-trades \
  --partitions 24 \
  --bootstrap-server kafka:9092

No downtime required. However, key routing changes:

Before: hash("BTCUSD") % 12 = partition 3
After:  hash("BTCUSD") % 24 = partition 15  # different partition

Old messages for BTCUSD remain in partition 3. New messages go to partition 15. Ordering across the boundary is lost until old messages are consumed.

Production preference: blue/green topic migration

For critical topics, prefer creating a new topic over altering:

1. Create "market-trades-v2" with 24 partitions
2. Producer writes to both "market-trades" and "market-trades-v2"
3. Consumer drains "market-trades" to completion
4. Consumer switches fully to "market-trades-v2"
5. Decommission "market-trades"

No key routing issues, no ordering concerns, clean cutover.

7. Message Size Configuration

All three sides must be configured consistently — a mismatch causes silent failures or stuck consumers.

# Broker
message.max.bytes=10485760           # max size of a single message
replica.fetch.max.bytes=10485760     # brokers exchanging replicas (most forgotten)

# Producer
max.request.size=10485760            # max size of a produce request (batch of messages)

# Consumer
fetch.max.bytes=52428800             # max bytes in a fetch response (across all partitions)
max.partition.fetch.bytes=10485760   # max bytes per partition per fetch

The most common mistake is forgetting replica.fetch.max.bytes. The message writes successfully but followers can't replicate it — the ISR shrinks silently until min.insync.replicas is violated and producers start failing.

8. Monitoring — What to Watch

Critical alerts (page immediately)

UnderReplicatedPartitions > 0    → broker down or lagging
OfflinePartitionsCount > 0       → data unavailable, producers will fail
ActiveControllerCount != 1       → controller election failure
ISR shrinking below RF-1         → losing fault tolerance

Warning alerts (investigate within hours)

Consumer lag growing steadily    → consumer can't keep up with producer
Consumer lag sudden spike        → likely rebalance or consumer restart
Producer request rate dropping   → batching issue or broker throttling
Fetch request rate dropping      → consumer stalled

Operational metrics

# Check consumer lag
kafka-consumer-groups \
  --bootstrap-server kafka:9092 \
  --describe --group my-consumer-group

# Check ISR status
kafka-topics \
  --bootstrap-server kafka:9092 \
  --describe --topic market-trades

# List running consumer groups
kafka-consumer-groups \
  --bootstrap-server kafka:9092 \
  --list

Tools: Prometheus + kafka_exporter for metrics, Burrow (LinkedIn) for consumer lag tracking, Grafana for dashboards and alerting.

Summary

The production Kafka philosophy for trading systems:

Design topics for peak load upfront — partition count is effectively immutable on hot topics
Use acks=all + enable.idempotence=true on every producer — non-negotiable for financial data
Commit offsets manually, after processing — at-least-once as your baseline guarantee
Make every sink idempotent with a natural unique key — ON CONFLICT DO NOTHING/UPDATE
Use CooperativeStickyAssignor — eliminate stop-the-world rebalances
Add brokers before adding partitions — real throughput comes from distributed I/O
Monitor ISR and consumer lag — these are your early warning signals for everything that goes wrong