Support for Multiple Columns in cluster_by in DLT Meta Framework

While working with the DLT Meta framework, I noticed that the Bronze And Silver table cluster_by parameter currently allows only a single column to be specified as a clustering key.

According to the official Databricks Create Streaming Table documentation, cluster_by supports defining a list of columns to enable liquid clustering on a table. This allows tables to be organized more efficiently using multiple clustering keys.

I recommend adding support for multiple columns in cluster_by in the DLT Meta framework. This enhancement would improve table optimization and make the framework consistent with standard Spark Declarative Pipeline capabilities.

Reference:
Databricks Spark Declarative Pipeline documentation:
https://learn.microsoft.com/en-in/azure/databricks/ldp/developer/ldp-python-ref-streaming-table

Excerpt from documentation:

cluster_by | list | Enable liquid clustering on the table and define the columns to use as clustering keys. See Use liquid clustering for tables.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Multiple Columns in cluster_by in DLT Meta Framework #253

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for Multiple Columns in cluster_by in DLT Meta Framework #253

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions