You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improving Materialized View (MV) Implementation in dbt-clickhouse
Materialized Views (MVs) in ClickHouse are a powerful feature that allows users to shift computation costs from query time to insert time, resulting in significantly faster SELECT queries. This capability makes MVs an essential tool for many ClickHouse users.
Current State of MV Materialization in dbt-clickhouse
The usage of MV materialization in dbt-clickhouse is extensive, as evidenced by the numerous issues and feature requests related to it. The current implementation has evolved and matured over time, primarily thanks to contributions from the community.
Current Implementation Details
As it stands today:
Both the MV relation and the target table are created within the same materialization.
This approach ensures an automatic link between the MV and its target table and guarantees that the target table's schema matches the MV query.
The implementation supports multiple MVs targeting the same target table. This is achieved by parsing the MV query and identifying the MVs through a predetermined comment, such as -- mv1.
While this design simplifies certain aspects, it also introduces limitations in flexibility and control for users.
Proposed Changes to MV Materialization
We propose splitting the creation process of the MV relation and the target table into two separate materializations:
Target Table Materialization: Users will create the target table as a standalone table materialization model.
MV Materialization: Users will define an MV materialization that references the target table.
Advantages of the Proposed Approach
By decoupling the creation of the target table and MV, users will have more control over the schema and backfilling processes.
This approach allows for easier management of multiple MVs targeting the same table without the need to parse comments in the MV queries.
Users can manage the target table independently, simplifying more complex use cases.
Challenges to Address
Reference Handling: A mechanism will be required to pass a ref() instance to the MV configuration (to link the MV to its target table within dbt) - a solution might be implementing a msgpack serialization in order to pass the ref as a config, for instance:
{{
config(
materialized='materialized_view',
target=ref("<your target table model ref>")
)
}}
(would love to hear of other solutions, maybe something that is curretnly exists within the adapter?) This feature will also benefit refreshable MV implementations by resolving dependency creation issues, ensuring dependencies are created before the MV.
Users will need to explicitly include the schema query in the target table materialization.
There needs to be a way to manage backfilling, such as providing an option to create an empty table initially.
This proposal is a starting point, and some details may need refinement. We would greatly appreciate feedback from the community, particularly from long-time contributors and experienced users who have been involved since the project's inception. Your insights will help ensure that these changes meet the needs of a broad range of use cases and improve the overall experience for everyone.
The text was updated successfully, but these errors were encountered:
Hey @BentsiLeviav, apologies for the delayed response. It's been a while since I've looked at the implementation of this, and I haven't tested my assumptions with actual code, so please excuse any mismatch between my comments and the current state of this extension and dbt generally.
Overall, I think this plan makes sense and provides the flexibility for a proven use case (multiple MVs targeting the same target table). I don't know what the failure scenarios look like in dbt or ClickHouse, but one reason for bundling the target table and MV creation into a single step was to prevent (or minimize) schema drift between the target table and MV query. With these as separate steps, I don't know what mechanisms are in place to prevent such drift. It could be that dbt/ClickHouse will alert the user before any significant changes happen, but I just want to highlight that that might be something to guard against during implementation.
As for referencing the target table in configuration, I think the config could be a raw string with the adapter checking to see if that's a relation that exists in the DB. I'm pretty sure get_relation could handle this? Apologies if this isn't what you're asking about and you're already aware of what I'm saying.
Otherwise, I would also hope that there's a way to implement this in a backwards-compatible way (maybe if target is defined in the config then the MV creation skips creating the underlying target table, otherwise the two steps are combined?).
Again, I apologize if this is something you've already thought of or is off-base. Thanks for continuing to evolve this feature!
Improving Materialized View (MV) Implementation in dbt-clickhouse
Materialized Views (MVs) in ClickHouse are a powerful feature that allows users to shift computation costs from query time to insert time, resulting in significantly faster SELECT queries. This capability makes MVs an essential tool for many ClickHouse users.
Current State of MV Materialization in dbt-clickhouse
The usage of MV materialization in dbt-clickhouse is extensive, as evidenced by the numerous issues and feature requests related to it. The current implementation has evolved and matured over time, primarily thanks to contributions from the community.
Current Implementation Details
As it stands today:
This approach ensures an automatic link between the MV and its target table and guarantees that the target table's schema matches the MV query.
While this design simplifies certain aspects, it also introduces limitations in flexibility and control for users.
Proposed Changes to MV Materialization
We propose splitting the creation process of the MV relation and the target table into two separate materializations:
Advantages of the Proposed Approach
Challenges to Address
msgpack
serialization in order to pass the ref as a config, for instance:(would love to hear of other solutions, maybe something that is curretnly exists within the adapter?) This feature will also benefit refreshable MV implementations by resolving dependency creation issues, ensuring dependencies are created before the MV.
This proposal is a starting point, and some details may need refinement. We would greatly appreciate feedback from the community, particularly from long-time contributors and experienced users who have been involved since the project's inception. Your insights will help ensure that these changes meet the needs of a broad range of use cases and improve the overall experience for everyone.
The text was updated successfully, but these errors were encountered: