I have a source which might include duplicate events identified by the same id. I want to materialize the last day of events from the source and ignore duplicate ids if they arrive and already exist. Will the provided SQL query work for this purpose?
Matan Perelmuter
Asked on Sep 06, 2023
EXCEPT (rnk)
instead of EXCEPT rnk
.interval '1 day'
.ROW_NUMBER()
function is used to assign a unique sequential integer to each row within a partition of a result set. In this case, it is used to partition by the id
column and order by proc_time
in ascending order.WHERE rnk = 1
condition filters out all rows except the first row within each partition, effectively selecting the latest event for each unique id
within the last day.events
will store the latest events for each id
from the source data within the last day, ignoring duplicates based on the id
column.