I'm encountering exceptions and internal errors when calling my UDF with large input data in RisingWave. The UDF server raises an exception indicating a possible client disconnect, and the postgres client reports an internal error with a channel closed message. Additionally, the meta-node logs show a warning about failing to reset compute nodes due to an RPC error. I've provided the function call, function signature, and the actual function code for context. Can anyone help me understand and debug these errors?
Dominic Lindsay
Asked on Sep 14, 2023
It appears that the compute node broke down and couldn't resume. After checking the logs in the compute node, I found a stack trace indicating an issue with building a record batch due to an incorrect row count. The error seems to be produced by the RisingWave Rust runtime when creating inputs for the Python UDF. Further investigation revealed that the cardinality
method, which should describe the number of rows in a batch, was incorrectly named and used. Runji Wang, one of the authors of the modules causing issues, confirmed that this is a bug and suggested changing input.cardinality()
to input.capacity()
in the code. A patch has been submitted and merged to address this issue, and I should try again with the latest nightly image of RisingWave.