How to fix ETCD leadership loss issue in RisingWave when ingestion rate increases?

I'm trying to use RisingWave to perform the equivalent of pandas.merge_asof. The issue is that when the source ingestion rate increases then etcd loses its leadership and so the meta node exits/crashes. Any ideas or pointers into how to fix this?


John Garland

Asked on Oct 11, 2023

  • Consider adjusting the meta_leader_lease_secs configuration in risingwave.toml to make the meta node tolerate higher latency from ETCD side and prevent crashes.

  • Increasing memory may help by caching more data/state and reducing the need to fetch data from the object store, which can compete with ETCD.

  • Evaluate disk performance, as non-SSD disks can cause high latency for ETCD requests and lead to failures.

  • In the future, consider replacing ETCD with a SQL DB to avoid similar issues.

Oct 11, 2023Edited by