If I have an unbounded stream of data (clickstream) and want to dedup with a distinct on clause, am I going to run into memory issues assuming I load all data (e.g. 3 years of clickstream data)?
Dan Reverri
Asked on Feb 08, 2024
The memory usage for deduplication in risingwave is not limited per operator. Instead, risingwave employs a global threshold to evict data in cache, preventing memory issues when dealing with large amounts of data. Theoretically, as long as you are not using an in-memory mode, you should not run into memory issues even with a significant amount of data. Risingwave has eviction mechanisms in place to dump cold data from cache to a persistent layer, ensuring efficient memory management.