How can I improve the stability of etcd in my Kubernetes cluster?

I have trouble keeping my etcd stable, with one of my pods getting stuck in CrashLoopBackOff daily. I currently have to manually reset it by dropping the member in etcdctl from another etcd member, deleting the pvc, and restarting the pod. Are there any recommendations to improve the stability of etcd in this situation?



Asked on Apr 04, 2024

  • Consider allocating more CPU, memory, or better SSD to etcd
  • Remove etcd and allow users to bring their own database as a metadata service
  • Migrate metastore to a SQL backend
  • Stay updated with the latest releases and release notes for improvements and updates
