Doug Whitfield

Doug Whitfield at

For the curious, here's an answer I got from a colleague:

#1 for a topic with 3 partitions, there is a broker lead for each partition and a single publisher (round-robin) will publish to all three broker leaders. Correct?
 For a topic with 3 partitions, there is a broker-leader for each partition and a single publisher, should they configure their ProducerRecord (https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html) to have neither a key nor a partition present then a partition will be assigned in a round-robin fashion. Because the topic partition is the unit of replication (https://kafka.apache.org/documentation/#replication), and each partition in Kafka has a single broker-leader for which all reads and writes go through that leader, it is correct to say that a single publisher will publish to all three broker leaders.
#2 for a topic with 3 partitions and three publisher instances running, each publisher would use all three broker leaders. Correct. The three publisher instances wouldn’t be aligned specifically to one of the three broker leaders. Correct?
 If each publisher was configured to create a ProducerRecord<K,V> that went to a specific partition, then this might be the case. However, if the three publishers, totally independently and unaware of each other, were configured to create ProducerRecords that used a specific partition, this might not be the case. Ultimately, it is up to the producer to decide which partition they are writing to, or to trust the hashing or round-robin distribution assigned by the broker.
#3 Rather than developing a model for dividing the work among publishers, wouldn’t the publisher offset internal topic manage that? it is supposed to help the publisher understand where it is in publishing events and where to start looking for the next event to publish. This would only work for a three publisher model if the publishers shared the internal publisher offset topic. I’m assuming that isn’t the case? each publisher instance would have its own internal offset topic?
If you’re referring to the idea that a publisher can store its current offset in a Kafka topic, and coordinate with other publishers working in tandem with it by relying on the committed topic offset, I’d like to clarify some terms first. If a producer is also a consumer, the consumer facet or nature of this worker/thread/app/route is what contains the offset. If you were consuming files from a network share, for example, you might move them to a hidden folder or delete them when you were done consuming them for publication onto a kafka topic, a simple but effective way of marking which files have been consumed and published and which are ready for consumption. In Kafka, the offset refers to the latest available message in the topic. So here, I’m assuming you mean your producer is also a kafka consumer, and is doing some work like enrichment before publishing the completed work to another topic for further processing.
If you want to coordinate between consumer-producers in this way, I would create a single topic with a single message type, but publish messages from each consumer-producer with a header stating which one is which. Then, on startup of the consumer-producer, you can consume a message first from the offset topic, filter for the header to determine which offset message belongs to the individual worker, and then start a stream at the offset it acquires.