I'm trying to create a materialized view (MV) in RisingWave which depends on two other views, but I keep encountering a timeout error with the following message from the metadata server:
thread 'risingwave-main' panicked at src/meta/src/barrier/mod.rs:303:17:
duplicated table in concurrent checkpoint
This happens after the first attempt times out. The MV I'm trying to create is as follows:
CREATE MATERIALIZED VIEW port_snapshot AS
SELECT
timestamp, nd.source, nd.iface, up
FROM
net_device_ifaces AS nd
RIGHT JOIN
port_history AS ph ON
nd.source = ph.source AND
nd.iface = ph.iface
WHERE
timestamp > NOW() - INTERVAL '15 days'
AND
timestamp = (
SELECT
MAX(timestamp)
FROM
port_history
WHERE
source = nd.source AND
iface = nd.iface
)
GROUP BY timestamp, source, iface, up;
The row count for port_snapshot
in that time period is approximately 2.8 million. Both input views are materialized views. The metadata server has 2 cores and 2 GB of RAM. I'm deploying using Kubernetes (k8s) with etcd, and using the RisingWave operator; deployed a 1.5.0
cluster. The job stalls at around 3.45% and eventually fails with an error indicating that the metadata server crashed and restarted. I've also noticed warnings about using an in-memory remote object store for Meta Backup, which is not recommended for a production environment. I'm currently using Google Cloud Storage (GCS) as the backing store for hummock.
What could be causing this timeout issue, and how can I resolve it to successfully create the materialized view?
Josh Toft
Asked on Dec 21, 2023
It seems that the root cause of the issue is related to GCS read operations taking an excessive amount of time, leading to a halt in handling barriers. To mitigate this issue, you can lower the streaming read timeout in the RisingWave configuration to avoid the problem from recurring. Specifically, you can set the object_store_streaming_read_timeout_ms
to a lower value, such as 60000
milliseconds. This workaround should help until the issue is fixed in a future release, such as version 1.6. Additionally, ensure that your backup_storage_url
and backup_storage_directory
are correctly configured in the risingwave.toml
file or through the ALTER SYSTEM SET
command.
Here's an example of how to set the streaming read timeout:
[storage.object_store]
object_store_streaming_read_timeout_ms = 60000
And for setting the backup_storage_url
:
ALTER SYSTEM SET backup_storage_url TO "gcs://your-bucket-name/your-subdirectory";