What does force_append_only='true' mean in Pulsar sink creation and what are the risks with append only?
Atiqul Islam is trying to create a Pulsar sink from a materialized view and is confused about the warning related to force_append_only='true'. Eric provides an explanation of what force_append_only='true' means and discusses the risks associated with append only in this context.
Atiqul Islam
Asked on Apr 04, 2024
-
force_append_only='true'
means that the sink will drop delete messages because it cannot handle delete operations. In this scenario, update events are considered as delete followed by insert. -
Risks associated with append only include the fact that deleted data will not be removed downstream. It is crucial to handle the output data correctly to avoid issues. For example, if the output data is ingested into a downstream database with a proper primary key for deduplication, there should be no problem. However, if the output data is stored in S3, deleted data will persist, potentially leading to data redundancy.