Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New MV implementation - Community Discussion #405

Open
BentsiLeviav opened this issue Jan 8, 2025 · 2 comments
Open

New MV implementation - Community Discussion #405

BentsiLeviav opened this issue Jan 8, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@BentsiLeviav
Copy link
Contributor

BentsiLeviav commented Jan 8, 2025

Improving Materialized View (MV) Implementation in dbt-clickhouse

Materialized Views (MVs) in ClickHouse are a powerful feature that allows users to shift computation costs from query time to insert time, resulting in significantly faster SELECT queries. This capability makes MVs an essential tool for many ClickHouse users.

Current State of MV Materialization in dbt-clickhouse

The usage of MV materialization in dbt-clickhouse is extensive, as evidenced by the numerous issues and feature requests related to it. The current implementation has evolved and matured over time, primarily thanks to contributions from the community.

Current Implementation Details

As it stands today:

  • Both the MV relation and the target table are created within the same materialization.
    This approach ensures an automatic link between the MV and its target table and guarantees that the target table's schema matches the MV query.
  • The implementation supports multiple MVs targeting the same target table. This is achieved by parsing the MV query and identifying the MVs through a predetermined comment, such as -- mv1.

While this design simplifies certain aspects, it also introduces limitations in flexibility and control for users.

Proposed Changes to MV Materialization

We propose splitting the creation process of the MV relation and the target table into two separate materializations:

  1. Target Table Materialization: Users will create the target table as a standalone table materialization model.
  2. MV Materialization: Users will define an MV materialization that references the target table.

Advantages of the Proposed Approach

  • By decoupling the creation of the target table and MV, users will have more control over the schema and backfilling processes.
  • This approach allows for easier management of multiple MVs targeting the same table without the need to parse comments in the MV queries.
  • Users can manage the target table independently, simplifying more complex use cases.

Challenges to Address

  • Reference Handling: A mechanism will be required to pass a ref() instance to the MV configuration (to link the MV to its target table within dbt) - a solution might be implementing a msgpack serialization in order to pass the ref as a config, for instance:
{{
    config(
         materialized='materialized_view',
         target=ref("<your target table model ref>")
            )
}}

(would love to hear of other solutions, maybe something that is curretnly exists within the adapter?) This feature will also benefit refreshable MV implementations by resolving dependency creation issues, ensuring dependencies are created before the MV.

  • Users will need to explicitly include the schema query in the target table materialization.
  • There needs to be a way to manage backfilling, such as providing an option to create an empty table initially.

This proposal is a starting point, and some details may need refinement. We would greatly appreciate feedback from the community, particularly from long-time contributors and experienced users who have been involved since the project's inception. Your insights will help ensure that these changes meet the needs of a broad range of use cases and improve the overall experience for everyone.

@BentsiLeviav BentsiLeviav added the enhancement New feature or request label Jan 8, 2025
@BentsiLeviav
Copy link
Contributor Author

@SoryRawyer @the4thamigo-uk
I saw you guys contributed to the MV implementation in the past, would you mind adding your 2 cents?

@SoryRawyer
Copy link
Contributor

Hey @BentsiLeviav, apologies for the delayed response. It's been a while since I've looked at the implementation of this, and I haven't tested my assumptions with actual code, so please excuse any mismatch between my comments and the current state of this extension and dbt generally.

Overall, I think this plan makes sense and provides the flexibility for a proven use case (multiple MVs targeting the same target table). I don't know what the failure scenarios look like in dbt or ClickHouse, but one reason for bundling the target table and MV creation into a single step was to prevent (or minimize) schema drift between the target table and MV query. With these as separate steps, I don't know what mechanisms are in place to prevent such drift. It could be that dbt/ClickHouse will alert the user before any significant changes happen, but I just want to highlight that that might be something to guard against during implementation.

As for referencing the target table in configuration, I think the config could be a raw string with the adapter checking to see if that's a relation that exists in the DB. I'm pretty sure get_relation could handle this? Apologies if this isn't what you're asking about and you're already aware of what I'm saying.

Otherwise, I would also hope that there's a way to implement this in a backwards-compatible way (maybe if target is defined in the config then the MV creation skips creating the underlying target table, otherwise the two steps are combined?).

Again, I apologize if this is something you've already thought of or is off-base. Thanks for continuing to evolve this feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants