I'm trying to use RisingWave (RW) with MongoDB Kafka Connect. I have an existing infrastructure where MongoDB Kafka Connector streams data to Kafka, which is then consumed by Pinot for creating and maintaining materialized views (MVs). I want to leverage this setup with RW to create and maintain some MVs that will either live only in RW or be streamed to Pinot for high QPS and strict latency requirements. I'm particularly interested in knowing if RW can ingest Kafka Connector streams directly and if this capability is tested and available out of the box. Additionally, I'm concerned about whether RW ingestion can handle partitioned MongoDB CDC streams that are not partitioned by the primary key.
Nizar Hejazi
Asked on May 11, 2023
Yes, RisingWave recently added support for DEBEZIUM_MONGO_JSON
which allows you to ingest MongoDB CDC data through Kafka. This feature is being tested and will be included in the next release. You can define a table in RisingWave with the DEBEZIUM_MONGO_JSON
row format to consume messages from Kafka. For example:
_id BIGINT PRIMARY KEY,
payload jsonb
)
WITH (
connector = 'kafka',
properties.bootstrap.server = '127.0.0.1:29092',
topic = 'debezium_mongo_json_customers')
ROW FORMAT DEBEZIUM_MONGO_JSON```
For partitioned streams not by primary key, RisingWave should handle ingestion smoothly, running parallel actors to fetch data from partitions without affecting the internal streaming computation. If you encounter any specific issues, it's recommended to run tests to ensure compatibility.