RisingWave Community - How Can I Force Cancel a Stuck Job in RisingWave?

To address the issue of a stuck job in RisingWave, you can try the following steps:

Set the pause_on_next_bootstrap system parameter to true using the command alter system set pause_on_next_bootstrap to true;.
Restart the meta node to stop it from entering a recovery loop.
If you have any problematic sinks, try dropping them after the meta node is back up.
If you're running a version prior to rw 1.5, consider upgrading to rw 1.5 as it has better memory control and contains bug fixes related to job cancellation failures.
Adjust the memory limits for the compute node to about 75% of the total available memory using the --total-memory-bytes flag to prevent OOM (Out of Memory) issues.
Ensure that the pause_on_next_bootstrap parameter is effective by checking the meta node logs for confirmation that the streaming jobs are paused.
If the pause is effective, try canceling the job again and then restart the meta node.

If these steps do not resolve the issue, you may need to investigate further by checking the logs for any errors or unusual activity that could be causing the job to remain stuck.