Does RisingWave lock up under heavy load with large LEFT OUTER JOIN queries?
I'm experiencing an issue where RisingWave (RW) seems to lock up when I run a very large LEFT OUTER JOIN query. This query is part of a process where I use it to determine updates, push those to Kafka, which then triggers RW to update a materialized view (MV). After starting the process, RW processes at a high rate, but eventually both my external program and RW stop updating without any errors. Even trivial queries like CREATE TABLE test(a int);
are significantly delayed. Is this a known issue, and how can I diagnose this problem?
Kai
Asked on Jan 19, 2024
It seems like the issue you're experiencing could be related to the system reaching a sync barrier or experiencing back pressure, which is causing delays and potential lockups. Zheng Wang pointed out that an epoch (barrier) has been in the graph for an extended period, which might be causing the issue. He also noted that the Grafana metrics were missing from your diagnostic report, which could be due to the --prometheus-endpoint
not being set correctly. It's recommended to use a stable version of RisingWave, such as v1.5.4, instead of the latest
build from the dev branch, which may contain bugs. Additionally, ensure that the --prometheus-endpoint
is correctly configured in your Docker Compose file to include metrics in the report, which can help diagnose back pressure or other performance issues.