You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
snapshot is implemented with the ora_hash() function.
This gives a lot of collissions in a large data set and duplicate ids, causing errors in the merge statement for data volumes of millions of rows.
dbt snapshot will fail once a hash collission appears, and the snapshot tables becomes impossible to update.
Expected Behavior
No collissions,
Steps To Reproduce
run dbt snapshot command on data with a few million updates.
First dbt snapshot command succeeds as there are only inserts, but duplicates in dbt_scd_id exists causing trouble on subsequest runs.
Relevant log output using --debug flag enabled
06:08:52 Database Error in snapshot table_name (snapshots\table_name.sql)
ORA-30926: unable to get a stable set of rows in the source tables
Help: https://docs.oracle.com/error-help/db/ora-30926/
compiled Code at target\run\dbt_project\snapshots\table_name.sql
Environment
- OS: Windows
- Python: 3.11
- dbt: 1.8.1
What Oracle database version are you using dbt with?
19c
Additional Context
No response
The text was updated successfully, but these errors were encountered:
And yes, we decided not to update ORA_HASH function because it would break existing snapshots, Users who are starting fresh snapshots should include the following macro in their dbt project. This uses a much stronger hash function - SHA256
{% macro oracle__snapshot_hash_arguments(args) -%}
STANDARD_HASH({%- for arg in args -%}
coalesce(cast({{ arg }} asvarchar(4000) ), '')
{% if not loop.last %} ||'|'|| {% endif %}
{%- endfor -%}, 'SHA256')
{%- endmacro %}
For existing snapshots, to migrate, there is an option.
dbt snapshot uses MERGE INTO statement.
MERGE INTO target t
USING staging s
ON (s.dbt_scd_id = d.dbt_scd_id)
So, before using the new hash function you need to UPDATE the column dbt_scd_id in the snapshot table using the STANDARD_HASH function.
Is there an existing issue for this?
Current Behavior
snapshot is implemented with the ora_hash() function.
This gives a lot of collissions in a large data set and duplicate ids, causing errors in the merge statement for data volumes of millions of rows.
dbt snapshot will fail once a hash collission appears, and the snapshot tables becomes impossible to update.
Expected Behavior
No collissions,
Steps To Reproduce
run dbt snapshot command on data with a few million updates.
First dbt snapshot command succeeds as there are only inserts, but duplicates in dbt_scd_id exists causing trouble on subsequest runs.
Relevant log output using
--debug
flag enabledEnvironment
What Oracle database version are you using dbt with?
19c
Additional Context
No response
The text was updated successfully, but these errors were encountered: