all-things-risingwave

Can Rising Wave handle a left outer join between a Kafka stream and a large history table?

I'm trying to perform a left outer join between a Kafka low-latency stream and a substantially large history table to identify new records, whether processed or unprocessed. The new records need to be written to a target Kafka topic and appended to the history table. The history table then needs to be refreshed to include the new data for further lookups. Can Rising Wave effectively manage this scenario?

sr

sri hari kali charan Tummala

Asked on Mar 01, 2024

Generally, Rising Wave can handle such a scenario, which is similar to an anti semi join in SQL. However, it's important to check if the necessary source and sink connectors are supported. Currently, only Kafka is supported as both source and sink. When implementing this, be mindful of the limitations of Rising Wave in incrementally calculating left outer joins and the potential for high join amplification. It's advisable to limit the number of elements processed at once to avoid overwhelming the system. Additionally, using temporal filters can help manage the timing of join operations, and considering strategies like bloom filters might mitigate some issues, although it's not a complete solution.

Mar 06, 2024Edited by