Does RisingWave Object Store Retry Streaming Uploads on Failure?
I'm working with RisingWave and encountered an error during a streaming upload to an object store. The error message indicates an IncompleteBody
issue, suggesting that the number of bytes provided did not match the Content-Length
HTTP header. I'm curious if RisingWave has a mechanism to automatically retry such failed upload operations, and what the implications are for the overall system behavior when a compaction task fails due to this error. Here's an example of the error message I'm seeing:
I've observed multiple such failures in a short period and want to understand the expected behavior of the system in these scenarios.
Ben Ellis
Asked on Apr 10, 2024
Currently, RisingWave retries read operations, but it does not automatically retry streaming uploads for write operations, as write op failures are considered rare. If a compaction task fails, it will be retried later. As long as the object store error is transient, RisingWave will progress towards a correct state. The IncompleteBody
error could indicate a broken connection before the upload was completed, possibly due to a poor connection between the RisingWave cluster and the MinIO server, or issues with the MinIO server itself. It's recommended to set up Prometheus and Grafana to monitor metrics and further investigate the issue.