Nikhil Srivastava is asking about the conscious design choices governing the latency between source data updates and materialised view updates in RW, as well as how Hummock service ensures consistency on queries in materialised views.
Nikhil Srivastava
Asked on Jun 06, 2023
In RisingWave, the latency between source data updates in the persistent store and materialised view updates is governed by the Chandy-Lamport algorithm. Barriers are injected periodically to sources and flow along the streaming graph. Each barrier has a monotonically increasing epoch, similar to a timestamp. Updates between two barriers are tagged with the epoch of the barrier. As the barrier flows through the streaming graph, nodes flush updates to storage upon receiving the barrier. Updates are asynchronously uploaded to S3. Once the barrier flows through all nodes and updates finish uploading to S3, the data of the barrier's epoch is considered committed and queryable.
To ensure consistency when querying materialised views, queries pin a latest epoch at the beginning. All reads on storage are based on this epoch, making updates later than the epoch invisible to the query. This approach supports consistency when querying materialised views. Non-checkpointed data can be queried with some session variable configuration, but consistency is not guaranteed in that case.