What does ‘batching’ mean when we are talking about Apache Kafka?

Today I learned that when you hear the word ‘batch’ in the context of Apache Kafka, it can mean one of two things:

  1. Reference to batch-only data processing systems. Batch-only system data into a . processes in surrounded Way. This means that there is a start time and an end time. Whether this batching is done in large or micro batches, it is all processed in one go. This is in contrast to the continuous data streaming enabled by Apache Kafka, in which data is processed in event-sized chunks.

  2. In the context of data streaming, there is something called producer batching. This is a misnomer because it is not really related to batch-only data processing systems. A Kafka producer, the client that publishes records to a Kafka cluster, compresses messages through a process called batching to increase throughput. This is part of the process of processing batching data all at once and in event-sized chunks, so it is not meant to be the same thing as batch data processing alone.

Finally, ‘batching’ means, in a very general way, ‘to group things together’. But ‘producer batching’ and ‘batch-only data processing systems’ do not share the term in any significant sense, as they are referring to completely different tasks as described above.

Leave a Comment