troubleshooting

Why do my compactor nodes keep crashing when using S3 storage?

I'm having an issue with my compactor nodes crashing and I'm not sure why. I'm using storage on S3. Initially, I get a bunch of errors like ERROR risingwave_common_service::observer_manager: Stream of notification terminated. and then the observer manager crashes with a panic in observer_manager.rs. Here's a snippet of the error log:

thread 'risingwave-main' panicked at src/storage/compactor/src/compactor_observer/observer_manager.rs:67:17:
error type notification
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/249624b5043013d18c00f0401ca431c1a6baa8cd/library/std/src/panicking.rs:597:5
...
Ri

Rick Otten

Asked on Nov 22, 2023

It seems like the compactor nodes are entering a boot-loop and crash shortly after being restarted. The h2 protocol error in the meta logs suggests network problems or abrupt disconnections. This could be related to the autoscaling behavior, where network connections might be dropped before the container is properly terminated, leading to connectivity errors. I've tried disabling autoscaling to see if the issue persists. Additionally, I've checked the logs for any grpc errors to the compute node, which might be related but don't seem to cause crashes. I'll continue investigating and will try to provide full backtraces for more detailed information.

Nov 29, 2023Edited by