How can I optimize my RisingWave queries to prevent state growth in materialized views?
I'm using RisingWave as a continuous ETL tool and I've noticed that the materialized view sizes for my sinks are growing over time, even though I expect them to be stateless or to clean up expired states. I've set up my queries to parse data from one Kafka topic and sink it to another, and to join data from two topics with a time constraint. However, the materialized_view_total_size
metric in Grafana shows a steady increase in storage usage, which doesn't match the actual bucket size in MinIO. Here are the queries I'm using:
-- Query for parsing bytes and sinking to another topic
CREATE SINK events_sink AS
...
-- Query for joining data from two topics
CREATE SINK mtapoczta_events_interval_join_sink AS
...
I've followed the instructions from the RisingWave Docker Compose documentation and I'm currently using version 1.6.1. Can I optimize these queries to prevent retaining any data? I thought that interval joins should clean up expired states as per the documentation. Could 'emit on window close' be helpful? Why is the parsed_events_sink, which only parses bytes, not stateless?
Krzysztof Sota
Asked on Mar 27, 2024
The queries you've written seem to be correct, and according to the explained plan, the operators are expected to have no state or clean outdated state. However, the state cleaning in the join operator might not be working as expected, which could be causing the growth in materialized view sizes. It's also possible that the sink write speed is slower than the source read speed, leading to a growing buffered state. To diagnose the issue further, you can run the following query to check which internal table's key and value sizes are increasing over time:
SELECT rw_internal_tables.id, rw_internal_tables.name, total_key_count, total_key_size/1024.0/1024.0/1024.0 as total_key_size_gb, total_value_size /1024.0/1024.0/1024.0 as total_value_size_gb FROM rw_internal_tables JOIN rw_table_stats ON rw_internal_tables.id = rw_table_stats.id AND rw_internal_tables.name LIKE '__internal_mtapoczta_events_interval_join_sink%';
If you find that certain internal tables are steadily increasing in size, it may indicate a problem with state cleaning. The RisingWave team may need to investigate this issue further to provide a resolution.