Skip to content

Support for Multiple Columns in cluster_by in DLT Meta Framework #253

@shishupalgeek

Description

@shishupalgeek

While working with the DLT Meta framework, I noticed that the Bronze And Silver table cluster_by parameter currently allows only a single column to be specified as a clustering key.

According to the official Databricks Create Streaming Table documentation, cluster_by supports defining a list of columns to enable liquid clustering on a table. This allows tables to be organized more efficiently using multiple clustering keys.

I recommend adding support for multiple columns in cluster_by in the DLT Meta framework. This enhancement would improve table optimization and make the framework consistent with standard Spark Declarative Pipeline capabilities.

Reference:
Databricks Spark Declarative Pipeline documentation:
https://learn.microsoft.com/en-in/azure/databricks/ldp/developer/ldp-python-ref-streaming-table

Excerpt from documentation:

cluster_by | list | Enable liquid clustering on the table and define the columns to use as clustering keys. See Use liquid clustering for tables.

Metadata

Metadata

Labels

help wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions