I'm trying to understand the scalability of a materialized view aggregate in RisingWave, specifically when counting unique events from a Kafka stream based on a hash field in the JSON payload. Here's an example query I'm considering:
create materialized view mv2 as select field1, count(field1) from s1 group by field1
I want to know how much throughput an individual RisingWave deployment can handle and when it's necessary to distribute data streams to separate deployments.
Bat Milkyway
Asked on Oct 23, 2023
The scalability of a materialized view aggregate in RisingWave depends on several factors:
field1
is small and can be cached in memory and disk, the query is very scalable. Throughput likely scales linearly with the number of CPUs.TUMBLE
or HOP
can confine the state to a specified interval, affecting how state is managed and potentially improving performance.It's recommended to run a workload on RisingWave to get a clearer idea of performance, especially since a small cache might still be effective if there's a small number of hot keys. Benchmarking results can also provide a general idea of throughput capabilities.