What to do when the size of intermediate state and materialized view exceeds memory of compute nodes with continuous incoming mutation streams?
I am reading a streaming engine design document where the intermediate state of each executor and materialized view are persisted on S3. What should be done when the size of intermediate state + materialized view exceeds the sum of the memory of all compute nodes, but there are continuous incoming mutation streams updating the materialized views?
Gopher
Asked on Dec 30, 2022
When the size of intermediate state and materialized view exceeds the memory of compute nodes with continuous incoming mutation streams, consider implementing a combination of strategies such as data partitioning, eviction policies, and distributed processing. Here are some approaches to handle this situation:
-
Data Partitioning: Split the data into smaller partitions and distribute them across multiple compute nodes to reduce the memory load on individual nodes.
-
Eviction Policies: Implement eviction policies to remove less frequently accessed data from memory, freeing up space for new incoming data.
-
Distributed Processing: Utilize distributed processing frameworks like Apache Spark or Apache Flink to leverage the computing power of multiple nodes for processing and managing the data.
-
Incremental Processing: Opt for incremental processing techniques to update the materialized views in smaller batches, reducing the memory requirements for each update.
By combining these strategies, you can effectively manage the memory constraints and handle continuous incoming mutation streams while maintaining the integrity of the materialized views.