troubleshooting

How to Resolve Meta Nodes CrashLoopBackoff in RisingWave Cluster?

I'm experiencing a CrashLoopBackoff issue with my RisingWave (RW) cluster's Meta nodes, which are dedicated to writing to Iceberg. Despite successful writes, the Meta nodes fail, preventing me from creating, modifying, or dropping sources or sinks. The error messages suggest issues with sink coordination and metadata collection. I've tried restarting all pods, but the problem persists. Here's an example of the error message:

risingwave-iceberg-meta-1 2024-04-11T20:19:51.315159859Z ERROR risingwave_meta::manager::sink_coordination::coordinator_worker: failed to wait for all writers error=one sink writer stream reaches the end before initialize
risingwave-iceberg-meta-1 2024-04-11T20:19:51.315181761Z ERROR risingwave_meta::manager::sink_coordination::coordinator_worker: unable to send msg
risingwave-iceberg-meta-1 2024-04-11T20:19:51.315186109Z  INFO risingwave_connector::sink::iceberg: Iceberg commit coordinator inited.
risingwave-iceberg-meta-1 2024-04-11T20:19:51.315205478Z ERROR risingwave_meta::manager::sink_coordination::coordinator_worker: failed to collect all metadata error=sink writer input reaches the end while collecting metadata

I also encountered a warning when trying to set pause_on_next_bootstrap to true:

WARN risingwave_meta::manager::system_param: The initializing value of "pause_on_next_bootstrap" (true) differ from persisted (false), using persisted value

How can I resolve this issue?

Ne

Neil

Asked on Apr 11, 2024

It seems like the issue might be related to the recovery process where the Iceberg coordinators are being cleaned up. To address this, you can try setting the pause_on_next_bootstrap parameter to true in the risingwave.toml file under the [system] block, as suggested by Yuhao Su and Rick Otten. This should pause the bootstrap process and allow you to drop the problematic sink. If you're unable to run SQL commands due to the Meta nodes not working properly, modifying the configuration file directly is a viable alternative. However, the warning message indicates that the system is using the persisted value instead of the initializing value you set, which could be an issue. You may need to investigate further to ensure that the new setting is applied correctly. If the problem persists, consider reaching out to the RisingWave community or support for more specialized assistance.

Apr 12, 2024Edited by