Critical
New topics default to a single copy
default.replication.factorTopics created without an override will not have enough replicas for durable production use.
Recommended change: Raise `default.replication.factor` to at least `3` for production clusters with enough brokers.
Why: Topic creation paths are rarely perfectly controlled. Safe defaults stop one missing flag from creating a fragile topic.
Critical
ISR requirement is too weak
min.insync.replicas`acks=all` will still succeed with only one in-sync copy available.
Recommended change: Use `min.insync.replicas=2` for replicated production topics when replication factor is at least 3.
Why: This is the guardrail that turns producer acknowledgements into real redundancy instead of a false sense of safety.
Critical
Unclean leader election can lose data
unclean.leader.election.enableOut-of-sync replicas are allowed to become leader.
Recommended change: Set `unclean.leader.election.enable=false` unless you have explicitly chosen availability over data safety.
Why: An out-of-sync leader can come online without all acknowledged writes, which makes earlier success responses impossible to trust.
Critical
Consumer offsets are under-replicated
offsets.topic.replication.factorThe internal offsets topic is not replicated strongly enough.
Recommended change: Raise `offsets.topic.replication.factor` to `3` where cluster size allows it.
Why: Consumer group stability depends on the offsets topic. Weak replication here translates directly into operational instability.
Critical
Transaction state is under-replicated
transaction.state.log.replication.factorKafka transactions rely on an internal topic that is configured with weak redundancy.
Recommended change: Raise `transaction.state.log.replication.factor` to `3` if transactions are part of your platform contract.
Why: Exactly-once features are only as resilient as the internal log storing their transaction state.
Warning
Broker I/O thread pool looks undersized
num.io.threadsThe broker has fewer I/O workers than a typical production starting point.
Recommended change: Review disk count, replica traffic, and CPU availability, then raise `num.io.threads` if the broker is busy.
Why: I/O threads gate log append and replica fetch work, so undersizing them can cause queueing long before the disks are truly full.
Warning
Network thread pool looks undersized
num.network.threadsThe broker may struggle to keep up with concurrent produce and fetch traffic.
Recommended change: Increase `num.network.threads` after checking connection counts and request handler idle time.
Why: Network threads often become the first bottleneck on busy brokers because every client and replica request passes through them.
Warning
Automatic topic creation is enabled
auto.create.topics.enableTypos and one-off experiments can create real topics with whatever defaults happen to be configured.
Recommended change: Disable automatic creation in production and provision topics intentionally.
Why: Unexpected topics are easy to miss and often inherit unsafe partition and replication defaults.
Warning
Retention window is short
log.retention.hoursData may age out before downstream consumers or replay workflows can catch up.
Recommended change: Validate that your slowest consumers and recovery procedures comfortably fit inside the retention window.
Why: Short retention is not wrong by itself, but it should be a deliberate cost decision rather than an unnoticed default.