I'm experiencing issues with running multiple Meta nodes in RisingWave. A single node works fine, but when I try to run two, the second one fails after a few minutes with an election error related to etcd. Here's the error message I'm seeing:
2023-12-07T18:21:06.197451894Z ERROR risingwave_meta_node::server: election error happened, Election failed: grpc request error: status: Unknown, message: "etcdserver: invalid auth token", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
I've confirmed that the etcd credentials and advertise addresses are correct and unique for each node. The new nodes connect to etcd correctly and create new leases, but they still fail after a few minutes. Is there any additional debugging I can enable, or specific things to look for in etcd to resolve this issue?
Rick Otten
Asked on Dec 07, 2023
It seems like the issue might be related to etcd server authentication problems. Make sure that the username and password for etcd are configured correctly for all meta nodes and that etcd_auth
is enabled. Each node should have a unique --advertise-addr
. If the problem persists, you can try running the cluster with a single node while the issue is being investigated. Additionally, you can execute etcdctl get --prefix "_meta_election"
and etcdctl lease list
to verify the election client when multiple meta nodes are connected to etcd. If you're running the cluster in AWS ECS, make sure that autoscaling is disabled for the meta nodes while troubleshooting. If the issue is urgent, consider reverting to a single meta node configuration until a solution is found.