all-things-risingwave

How can I create an internal sink from a materialized view to a table in RisingWave?

I'm working with RisingWave and I want to create an internal sink from a materialized view to a table to maintain uniqueness by a certain key without having to send data out to Kafka and back. Here's an example of what I'm trying to achieve:

INSERT INTO contacts_raw (id, timestamp, payload) VALUES
  (1, '2023-06-13T12:00:00Z', '{"name": "Bob", "sourceid": 4, "hobbie": null }') ;

create materialized view contacts as
select
 (payload ->> 'name') as name,
 (payload ->> 'hobbie') as hobbie,
 (payload -> 'sourceid')::bigint as sourceid from contacts_raw;

I want to ensure that the materialized view maintains the latest record for each sourceid. Is there a direct way to do this in RisingWave?

Co

Colin Bankier

Asked on Dec 21, 2023

Hi Martin,

Thank you for the information. It's great to hear that an internal sink feature is in progress and may be available in the next couple of versions. I understand that this will allow enforcing a primary key constraint to maintain the latest record for each key.

Also, thanks to Tianxiao Shen for suggesting the use of SELECT DISTINCT ON. I've confirmed that it works for my case and it's good to know that RisingWave supports this feature similar to PostgreSQL. I appreciate the clarification on the documentation oversight.

Regarding the retention of contacts_raw data, I see the value in being able to update or redefine the materialized view later with the original data. I'm also interested in the upcoming support for INCLUDE partition, offset in Kafka sources, which would allow sorting by the original Kafka offset.

Lastly, it's reassuring to know that RisingWave has optimized the deduplication process and won't maintain the full historical state, as explained by tianshuo shi. The StreamAppendOnlyDedup executor maintaining state only with the cardinality of the distinct key is a suitable solution for my requirements.

Thank you all for your help!

Dec 21, 2023Edited by