I'm exploring RisingWave to streamline our data migration process. We're using CDC (pulsar) on multiple tables, which becomes complex as the number of tables grows. We want to use RisingWave to join tables via a materialized view and sink the combined data back into Pulsar. Our goal is to subscribe to a single change feed with all relevant data.
The challenge is defining every column with a unique alias from each table in the view, which is cumbersome. We're also concerned about the overhead when adding new columns upstream.
Is it possible to store an entire table in a jsonb
column and extract only the necessary fields? For example:
Table1: `primary_key`, `data` (jsonb of entire record)
Table2: `primary_key`, `table1_pkey`, `data` (jsonb of record)
View: `SELECT primary_key, t1.data as t1_data, t2.data as t2_data from Table1 t1 LEFT JOIN Table2 ON t1.primary_key = t2.table1_pkey`
If not, are there other patterns to achieve something similar?
Grant
Asked on Sep 11, 2023
Yes, it is possible to store an entire table in a jsonb
column and extract only the necessary fields. You can create a table with a jsonb
field that includes all the fields you might want to extract in the future. For example:
create table table1(
id int,
data jsonb,
primary key (id)
) with ( connector = 'pulsar' ...... );
However, raw CDC data in JSON format from Debezium might need some transformation to fit this structure. If you're dealing with Debezium's JSON format, where current values are under after
, you might need to perform a transformation into another topic and then use that as the source for RisingWave.