troubleshooting

How to Resolve RisingWave Meta Node Bootstrapping Issue?

I'm experiencing an issue where my RisingWave meta node isn't finishing the bootstrapping process. It seems to be stuck with a warning message indicating it's waiting for new workers to join. Here's the log output:

WARN bootstrap_recovery{prev_epoch=6010333938843648}:recovery_attempt: risingwave_meta::barrier::recovery: waiting for new workers to join, elapsed: 352s

I suspect this might be due to some remnants of past scaling operations. Could this also be related to an incorrect parallelism setting? We're running RisingWave with a single node for meta, frontend, compactor, and compute node self-hosted in a Kubernetes cluster, using etcd for the meta store. The compute node is allocated 22 CPU cores with 96Gi of RAM, and the RW_PARALLELISM environment variable is set to 22. Previously, we were using 28 CPUs. How can I resolve this bootstrapping issue?

Vi

Victor Müller

Asked on Feb 26, 2024

It seems that the issue is related to a change in the CPU configuration and the RW_PARALLELISM environment variable. Since you previously had 28 CPUs and now have 22, the cluster's parallelism has been affected. To resolve the issue, you should forcefully set RW_PARALLELISM to 28, even though you're requesting resources for 22 CPUs from Kubernetes. This should restore the cluster. You can do this by modifying the StatefulSet of the compute node. The upcoming version 1.7 of RisingWave will address this issue by automatically adjusting the parallelism for streaming jobs when the cluster's parallelism changes.

Feb 26, 2024Edited by