How to resolve the crashloopback error on RisingWave cluster due to data directory being used by another cluster?
I'm encountering a crashloopback error on the meta node of our RisingWave (RW) cluster. The error message indicates that the data directory is already used by another cluster with a specific ID. I've tried deleting the file containing the cluster ID, but the problem persists, and after redeploying the statefulset of compute nodes, they also experience crashloopback errors. Here's the error from the logs:
thread 'risingwave-main' panicked at 'called `Result::unwrap()` on an `Err` value: ObjectStore(Internal error: data directory is already used by another cluster with id "14a92dee-7219-431b-9e30-af4c7ff6a5e1"
Dat Nguyen
Asked on Jun 19, 2023
The error you're experiencing is due to a safeguard in RisingWave that prevents different clusters from using the same data directory to avoid misconfiguration. Here's what you can do to resolve the issue:
- Ensure that there are no leftover states in your etcd and S3. If there are, you need to clean them up.
- You can set a new value for the
"--data-directory"
option other than the one currently being used. - Alternatively, you can clean up the specific path in your S3 bucket (e.g.,
dev-int-risingwave-s3/hummock001
) and also clean up the etcd. - If you're using Kubernetes, make sure to check if there are any persistent volumes that need to be cleaned using
kubectl get pv
and delete them if necessary. - After cleaning up, you may need to restart the frontend node to clear any obsolete state in the cache.
- If the issue persists, you may need to shut down the cluster, clean up etcd and the S3 bucket thoroughly, and recreate the cluster.
Remember to follow the official documentation for deploying a RisingWave instance on Kubernetes with S3 and etcd to avoid such issues in the future. If you encounter a similar error again, it might be a sign that something is wrong with the cluster setup, and you should check the full logs and pod status using kubectl get pods
.