all-things-risingwave
What kind of cluster sizing is necessary to handle aggregation queries over a Kafka topic with historical data?
I have a Kafka topic that receives ~300 JSON events per second (around 1.3 MB/s) and I want to flatten the structure and create a materialized view from it. The topic contains historical data that I would like to include in the query result. Running a simple count takes a long time. What kind of cluster sizing would be necessary to handle this amount of data?
Ca
Can Yavuz
Asked on Sep 12, 2023
- Recommend trying at least 16 CPU for the compute node, especially during the stage of reading historical data to build the materialized view.
- Use as many CPUs as possible to shorten the period before the MV catches up with the latest data.
- After the MV catches up with new data, scale in the resources to accommodate the throughput of new data.
- Put as much work as possible into the materialized view to simplify queries for tools like Superset.
- Going vertical (increasing resources on a single node) is preferred to avoid network overhead, but consider a mix of vertical and horizontal scaling for cost-effectiveness in cloud environments like EC2.
Sep 14, 2023Edited by