Skip to content

Latest commit

 

History

History
154 lines (103 loc) · 5.51 KB

File metadata and controls

154 lines (103 loc) · 5.51 KB

Roadmap

This document outlines the planned direction for the Databricks Bundle Template. It is a living document that evolves based on community feedback, real-world usage, and contributor interest.

Vision

A comprehensive, community-driven Databricks Asset Bundles template that covers the most common real-world configurations for production data engineering projects. The template should be opinionated where it matters (secure defaults, proven patterns) and flexible where teams differ (branching strategies, compute choices, permission models).

How to Influence the Roadmap

Features with the most community interest and contributor champions move forward first.


Feature Description Format

Each planned feature follows this structure:

### Feature Title
**Status**: Proposed | Planned | In Progress | Shipped
**Target**: vX.Y

Brief description of what this feature does and why it matters.

**Scope:**
- What's included in this feature

**Open questions:** (optional)
- Unresolved design decisions

Planned Features

Template Version Tracking

Status: Proposed Target: v1.1

Add metadata to generated resources so teams can track which template version was used and trace back to the source.

Scope:

  • Custom tags on generated Databricks resources (jobs, pipelines)
  • Template version recorded in bundle_init_config.json
  • Optional git source info in job parameters

Future Ideas

These are larger features that require more design work and community input before committing to implementation.

Asset Sub-Templates (Plugins Layer)

Status: Proposed Target: v2.0

Modular add-on templates that can be applied to an existing generated project to add new resources. This follows the pattern demonstrated in the Databricks bundle-examples repo under contrib/data_engineering example template.

Scope:

  • assets/etl-pipeline/ - Add a new LDP pipeline with bronze/silver layers and DLT expectations
  • assets/ingest-job/ - Add a data ingestion job with error handling
  • assets/ml-pipeline/ - Add an ML training pipeline with experiment tracking
  • assets/dbt-project/ - Add dbt integration with Unity Catalog

Usage pattern:

# Initialize base project first
databricks bundle init https://github.com/vmariiechko/databricks-bundle-template

# Later, add a new pipeline to the existing project
cd my_project
databricks bundle init https://github.com/vmariiechko/databricks-bundle-template \
  --template-dir assets/etl-pipeline

Open questions:

  • How do asset templates reference existing variables from variables.yml?
  • Should assets modify databricks.yml or create standalone resource files?
  • How to handle naming conflicts with existing resources?

Advanced Permissions Profiles

Status: Proposed Target: v2.0

Replace the current yes/no permissions toggle with a set of predefined profiles that cover more organizational patterns.

Scope:

  • Full (4 groups): developers, qa_team, operations_team, analytics_team
  • Team (2 groups): developers, analytics_team
  • Minimal (owner only): no group-based permissions, only bundle owner
  • None: no permissions blocks at all

Open questions:

  • Should custom group names be configurable?
  • How to handle migration from yes/no to profiles without breaking existing users?

Bundle UUID and Git Source Tracking

Status: Proposed Target: v2.0

Enhanced traceability for generated bundles, useful in large organizations managing many bundle deployments.

Scope:

  • Unique bundle UUID generated at init time
  • Git repository URL and commit hash recorded in bundle metadata
  • Traceable from deployed resources back to template version and configuration

Completed

v1.1.0

Workspace Topology Configuration

Status: Shipped

Configurable workspace topology: single shared workspace (default) or separate workspaces per environment. Multi-workspace mode generates variable-based hosts in databricks.yml and adds DATABRICKS_HOST to Azure CI/CD pipelines.

Scope:

  • New workspace_setup prompt (single_workspace / multi_workspace)
  • Placeholder-based workspace hosts pattern WORKSPACE_HOST_PLACEHOLDER_* in databricks.yml.tmpl
  • Azure CI/CD updated with per-environment DATABRICKS_HOST
  • Updated documentation across README, QUICKSTART, CI_CD_SETUP
  • 4 new test configurations for multi-workspace scenarios (19 total configs)

Branching Strategy Diagrams

Status: Shipped

Visual diagrams illustrating the environment-branch promotion model for full mode (with hotfix flow) and minimal mode. Diagrams are conditionally embedded in the generated CI_CD_SETUP.md based on environment setup.

v1.0.0

See CHANGELOG.md for the full list of features shipped in v1.0.0, including:

  • Multi-environment deployment (user/dev/stage/prod)
  • Configurable compute (classic/serverless/both)
  • Unity Catalog with medallion architecture
  • Optional RBAC with environment-aware groups
  • Service principal architecture
  • CI/CD for Azure DevOps, GitHub Actions, GitLab
  • Cloud support for Azure, AWS, GCP
  • 1531 tests across 15 configurations