I'm encountering an error in our internal cluster with RisingWave that states:
ExecuteError: internal error: Storage error: Hummock error: Barrier read is unavailable for now. Likely the cluster is recovering.
This error has occurred before, and restarting the pods fixed it, but now the meta node is stuck in CrashLoopBackoff
. I've attached the logs in the thread. Can anyone help me understand and fix this issue?
Bahador Nooraei
Asked on Jul 24, 2023
The error you're seeing suggests a data corruption issue, likely due to the absence of a 'hummock version checkpoint' from the object store in the /checkpoint
directory. To resolve this, you should clean both the metadata stored in etcd and the data in the storage. If you're deploying a new cluster, ensure that you're not using an etcd from a previous cluster with an empty object store, as this can lead to inconsistencies. Additionally, if you're using MinIO, check for any 'DELETE' operations that might have removed the /checkpoint
directory. It's also recommended to use S3 for better performance and consistency, and to ensure that etcd has a persistent volume attached to prevent data loss on restarts.