I have a table with 1.2 billion rows.
This table tracks a users transaction and which device it used to complete the transaction.
Not all transactions give us the device id so we are trying to fill in the gaps with the correct device id by building a tables which has date ranges that each device was used per user.
Originally it was written like this.
This gave us 80% correct data because 20% of the time, users went back to the former device.
I know the following would work:
I am concerned that this will be to large to run in production and our dev environment is not in a position to test. Are there any suggestions?
The table is partitioned by month.
Thanks in advance!
This table tracks a users transaction and which device it used to complete the transaction.
Not all transactions give us the device id so we are trying to fill in the gaps with the correct device id by building a tables which has date ranges that each device was used per user.
Originally it was written like this.
Code:
SELECT userid, deviceid, min(time_stamp)
FROM t_transaction;
This gave us 80% correct data because 20% of the time, users went back to the former device.
I know the following would work:
Code:
SELECT userid, deviceid, time_stamp
FROM (SELECT userid,
deviceid,
time_stamp,
lag(deviceid) over (partition by userid order by time_stamp) p_deviceid) x
WHERE deviceid <> p_deviceid
I am concerned that this will be to large to run in production and our dev environment is not in a position to test. Are there any suggestions?
The table is partitioned by month.
Thanks in advance!