Why are queries on Rising Wave running slow and how can I improve performance?
I'm experimenting with Rising Wave as an alternative for Apache Flink and have set it up locally using Docker. After ingesting a large amount of data and creating a source and materialized view, I'm experiencing slow query performance. Queries like select count(*) from the_view_name;
and select * from the_view_name where name_of_a_column = 'Completed';
are taking around 7 minutes to complete, which seems excessive. It appears that these queries are doing a full row scan of all the data. I haven't used primary keys or indexes in my setup. Can you provide guidance on what might be wrong and how to optimize the performance?
Jerome van den Heuvel
Asked on May 20, 2023
It's true that your queries are performing a full row scan because there's no primary key or indexes to accelerate lookups. Here are some steps you can take to improve performance:
- Use
CREATE TABLE
instead ofCREATE SOURCE
and specify primary keys if possible. This will allow queries with primary key filters to be faster. - Create indexes on columns that are frequently used as filters to speed up queries.
- Consider deploying Rising Wave with cloud storage like AWS S3 for better performance, as local MinIO might be slower.
- Ensure Rising Wave is running in
distributed mode
by settingquery_mode to distributed
inpsql
. - Run the same query twice to see if caching improves performance.
- Monitor CPU usage during query execution to check if it's a bottleneck.
Additionally, since you're interested in Azure Blob Storage, you can set it up as the object storage backend, which would eliminate the dependency on MinIO. The Rising Wave operator supports Azure Blob Storage, and you can find guidance on setting it up in the documentation. Remember to use the latest image tag for now, and the upcoming 0.19 version will be stable and production-ready with Azure Blob support.