I have a job in RisingWave that I can't seem to cancel. I've tried various methods including CANCEL JOBS <job id>
, restarting the cluster, and allowing the job to run over the weekend, but it's stuck at 0.0% progress. I also attempted FLUSH
but received an error: QueryError: internal error: Service unavailable: cluster is under recovering
. Are there any other steps I can take to remove this job?
Dominic Lindsay
Asked on Dec 11, 2023
To address the issue of a stuck job in RisingWave, you can try the following steps:
pause_on_next_bootstrap
system parameter to true
using the command alter system set pause_on_next_bootstrap to true;
.--total-memory-bytes
flag to prevent OOM (Out of Memory) issues.pause_on_next_bootstrap
parameter is effective by checking the meta node logs for confirmation that the streaming jobs are paused.If these steps do not resolve the issue, you may need to investigate further by checking the logs for any errors or unusual activity that could be causing the job to remain stuck.