-
Notifications
You must be signed in to change notification settings - Fork 108
Description
While working with the DLT Meta framework, I noticed that the Bronze And Silver table cluster_by parameter currently allows only a single column to be specified as a clustering key.
According to the official Databricks Create Streaming Table documentation, cluster_by supports defining a list of columns to enable liquid clustering on a table. This allows tables to be organized more efficiently using multiple clustering keys.
I recommend adding support for multiple columns in cluster_by in the DLT Meta framework. This enhancement would improve table optimization and make the framework consistent with standard Spark Declarative Pipeline capabilities.
Reference:
Databricks Spark Declarative Pipeline documentation:
https://learn.microsoft.com/en-in/azure/databricks/ldp/developer/ldp-python-ref-streaming-table
Excerpt from documentation:
cluster_by | list | Enable liquid clustering on the table and define the columns to use as clustering keys. See Use liquid clustering for tables.