troubleshooting

How to remove a stuck job when the meta-node fails in RisingWave?

I'm facing an issue with RisingWave where the meta-node died and now I cannot remove a job. I've been following the instructions provided, but the job seems to be stuck. Here's what I've done so far:

  1. Set system parameter by psql: ALTER SYSTEM SET pause_on_next_bootstrap to true
  2. Restart the meta node.
  3. Attempted to drop the relevant mviews, but they don't exist.
  4. Restart the meta node again to resume.

However, when I try to cancel the job with cancel jobs 62001, it returns (0 rows), and the job still appears in SHOW JOBS with 0.00% progress. Additionally, I get an error when trying to delete from rw_catalog.rw_ddl_progress.

The meta-node log shows an error related to failing to cancel a recovered streaming job. Any tips on how to proceed?

An

André Falk

Asked on Nov 30, 2023

I've been following the steps to remove a stuck job after a meta-node failure, but the job seems to be stuck at 0.00% progress and I can't cancel it or delete its progress from rw_catalog.rw_ddl_progress. The meta-node log indicates an error with canceling a recovered streaming job. What should I do next to resolve this issue?

Dec 04, 2023Edited by