Neil is looking for a way to push data in 1-minute batches to Iceberg sink in real-time streaming to reduce the number of metadata files. He is wondering if using TUMBLE windows or other methods can achieve this behavior and if there are any examples available. Additionally, he is asking if the checkpoint_frequency
setting can be tweaked per sink instead of being a cluster-wide setting.
Neil
Asked on Mar 20, 2024
To push data in 1-minute batches to an Iceberg sink in real-time streaming, you can use TUMBLE windows or other time-based windowing functions in your streaming query.
You can achieve the desired behavior by setting the checkpoint_frequency
parameter in your streaming job. However, the checkpoint_frequency
setting is a cluster-wide setting and cannot be tweaked per sink.
To have specific sinks run in 1-minute batches while others run in different intervals, you may need to implement custom logic in your streaming job to control the batch intervals for each sink separately.