docs: PR/MR data quality review feature page#2149
Conversation
…context-in-integration added message context parsing
added sparkles emoji
…port" in various test fixtures and utility functions.
… LinksLineBlock for rendering multiple links with optional icons, and updated AlertMessageBuilder to utilize this new structure.
…tary-links Enable support for multiple links and icons in alert messages
…clickhouse ignore monitor failure on clickhouse
…chema cleanup (#2127) * perf: optimize CI by skipping redundant dbt test artifacts upload and adding schema cleanup - Add --vars to dbt test step to disable artifact autoupload (already done during dbt run) and skip per-table temp table cleanup (schema will be dropped at end) - Add drop_test_schemas macro and CI step to drop both main and elementary schemas after tests - This significantly reduces on-run-end hook time, especially for Athena where each query has high overhead (~3-5s) and the hook was issuing 100+ individual queries Co-Authored-By: Itamar Hartstein <[email protected]> * fix: correct macro namespace to elementary_integration_tests Co-Authored-By: Itamar Hartstein <[email protected]> * fix: update macro namespace to elementary_integration_tests in drop_test_schemas.sql Co-Authored-By: Itamar Hartstein <[email protected]> * style: quote ClickHouse schema identifier in DROP statement Co-Authored-By: Itamar Hartstein <[email protected]> * refactor: move clean_elementary_temp_tables to dbt_project.yml per review feedback - clean_elementary_temp_tables: false now set in dbt_project.yml (applies to all steps) - disable_dbt_artifacts_autoupload: true remains on CLI for dbt test step only (dbt run still needs to upload artifacts) Co-Authored-By: Itamar Hartstein <[email protected]> * refactor: move disable_dbt_artifacts_autoupload to dbt_project.yml Both vars now in dbt_project.yml, no --vars needed on CLI. disable_dbt_artifacts_autoupload only gates the on-run-end hook's upload_dbt_artifacts() call, so it's safe to set globally. Co-Authored-By: Itamar Hartstein <[email protected]> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <[email protected]>
…tion (#2129) * fix: surface exception text in APIDbtRunner for transient error detection APIDbtRunner only captures JinjaLogInfo and RunningOperationCaughtError events into the output field. When a command fails with a transient error (e.g. RemoteDisconnected), the error text lives in res.exception — not in the captured output. This means _inner_run_command_with_retries has nothing to match against and never fires the retry. Fix: extract res.exception text and put it into the stderr field of APIDbtCommandResult (the dbt Python API doesn't use stderr). This allows the transient error detection in _inner_run_command_with_retries to examine the exception text, analogous to how SubprocessDbtRunner captures subprocess stderr. Co-Authored-By: Itamar Hartstein <[email protected]> * style: address CodeRabbit review - remove /tmp paths and prefix unused args Co-Authored-By: Itamar Hartstein <[email protected]> * style: fix black formatting - wrap long function signatures Co-Authored-By: Itamar Hartstein <[email protected]> * style: fix black formatting - keep short signature on one line Co-Authored-By: Itamar Hartstein <[email protected]> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <[email protected]>
Fixes #2037 The --config-dir flag was parsed but never passed to Config(), which always used the default ~/.edr. This wires the CLI argument through so the custom config directory is actually used. - Remove premature directory creation from Config._load_configuration that ran at import time (logo/upgrade) and created ~/.edr before CLI parsing - Create config_dir in anonymous_tracking when writing user id file - Add --config-dir to debug and dbt_init commands for consistency - Use keyword args for Config() in report command for clarity Co-authored-by: Itamar Hartstein <[email protected]>
…tion version (#2131) * Modernize test-github-action.yml to use e2e_dbt_project - Replace deprecated dbt-data-reliability integration tests with the newer e2e_dbt_project (matching test-warehouse.yml approach) - Use generate_profiles.py + profiles.yml.j2 instead of the TEST_GITHUB_ACTION_PROFILES_YML secret - Update run-elementary-action from v1.8 to v1.12 - Add concurrency management and schema cleanup - Add generate-data workflow input - Temporarily add pull_request trigger for CI testing Co-Authored-By: Itamar Hartstein <[email protected]> * Add masking for profiles.yml credentials in GITHUB_OUTPUT Co-Authored-By: Itamar Hartstein <[email protected]> * Fix jq validation to work correctly under bash -e Co-Authored-By: Itamar Hartstein <[email protected]> * Temporarily point action at Python 3.10 fix branch The run-elementary-action Docker image uses Python 3.9, but elementary-data 0.22.0 dropped Python 3.9 support. Point at the fix branch (devin/1772355616-fix-python-version) which updates the Dockerfile to Python 3.10. This should be updated to the new release tag once elementary-data/run-elementary-action#15 is merged and tagged. Co-Authored-By: Itamar Hartstein <[email protected]> * Update run-elementary-action to v1.13 (with Python 3.10 fix) Replace temporary branch reference with the official v1.13 release tag which includes the Python 3.10 Dockerfile fix. Co-Authored-By: Itamar Hartstein <[email protected]> * Add -t snowflake to edr monitor report command Co-Authored-By: Itamar Hartstein <[email protected]> * Remove temporary pull_request trigger Co-Authored-By: Itamar Hartstein <[email protected]> * Remove Notify Slack job Co-Authored-By: Itamar Hartstein <[email protected]> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <[email protected]>
* fix: add ref() macro override for dbt-core 1.11 compatibility
Some adapters (e.g. dbt-dremio) override the ref() Jinja macro with a
signature that doesn't support the two-argument positional form used for
cross-package references: ref('package', 'model').
dbt-dremio's ref(model_name, v=None) intercepts the second positional
arg as a version identifier, breaking refs like ref('elementary', 'dbt_models').
This override in the elementary_cli monitor project forwards all arguments
directly to builtins.ref(), which correctly handles all ref() forms in
dbt-core 1.11+. It also supports the legacy package= keyword form for
backward compatibility.
Co-Authored-By: Itamar Hartstein <[email protected]>
* fix: handle combined cross-package + versioned refs for forward compatibility
Co-Authored-By: Itamar Hartstein <[email protected]>
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Itamar Hartstein <[email protected]>
…1802) * Add optional ssl config flag * fix: honor SSL context for Slack workflow webhooks - Update _initial_client in SlackWebhookClient to configure requests.Session with SSL verification - When ssl_context is provided, set session.verify to certifi CA bundle - Add warning when workflow webhooks cannot fully honor --use-system-ca-files setting - Ensures SSL context is respected for workflow webhook requests * precommit fix * fix is_workflow logic * refactor: replace --use-system-ca-files with --ssl-ca-bundle Replace the boolean --use-system-ca-files/--no-use-system-ca-files flag with a more flexible --ssl-ca-bundle option that accepts 'certifi', 'system', or a custom file path. When omitted, each library keeps its own default CA behaviour (no change from prior behaviour). Key changes: - Add elementary/utils/ssl.py helper to resolve ssl_ca_bundle into SSLContext - Config now uses _first_not_none pattern so ssl_ca_bundle is also loadable from config.yml - Apply SSL context to both legacy SlackClient and newer SlackWebMessagingIntegration / SlackWebhookMessagingIntegration code paths Made-with: Cursor * fix: validate empty ssl_ca_bundle values explicitly Raise a clear error when ssl_ca_bundle is an empty or whitespace-only string, instead of falling through to the file path check which would produce a confusing "path does not exist" error. Made-with: Cursor --------- Co-authored-by: Itamar Hartstein <[email protected]>
* fix: restore missing test_config_levels macro and fix uniques typo in e2e project - Add test_config_levels generic test macro that was removed from dbt-data-reliability in PR #947 but still referenced by the e2e project - Fix 'uniques' -> 'unique' (standard dbt generic test) on error_model Co-Authored-By: Itamar Hartstein <[email protected]> * fix: revert uniques change - intentional error case for error_model The 'uniques' test (non-existent) on error_model.missing_column was intentionally creating a compilation error case, tagged as error_test. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: define mandatory_params as none in test_config_levels macro The macro calls elementary.get_anomalies_test_configuration() with mandatory_params but it was never defined in scope. In the original dbt-data-reliability code, this was a parameter on the calling test macros. Since test_config_levels doesn't need mandatory param validation, define it locally as none. Co-Authored-By: Itamar Hartstein <[email protected]> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <[email protected]>
* feat: add DuckDB, Trino, Dremio & Spark support to CI and CLI (part 1) - pyproject.toml: add dbt-duckdb and dbt-dremio as optional dependencies - Docker config files for Trino and Spark (non-credential files) - test-all-warehouses.yml: add duckdb, trino, dremio, spark to CI matrix - schema.yml: update data_type expressions for new adapter type mappings - test_alerts_union.sql: exclude schema_changes for Spark (like Databricks) - drop_test_schemas.sql: add dispatched edr_drop_schema for all new adapters - transient_errors.py: add spark and duckdb entries to _ADAPTER_PATTERNS - get_adapter_type_and_unique_id.sql: add duckdb dispatch (uses target.path) Co-Authored-By: Itamar Hartstein <[email protected]> * feat: add Docker startup steps for Trino, Dremio, Spark in CI workflow Co-Authored-By: Itamar Hartstein <[email protected]> * feat: add DuckDB, Trino, Dremio, Spark profile targets Co-Authored-By: Itamar Hartstein <[email protected]> * feat: add Trino Iceberg catalog config for CI testing Co-Authored-By: Itamar Hartstein <[email protected]> * feat: add Spark Hive metastore config for CI testing Co-Authored-By: Itamar Hartstein <[email protected]> * feat: add Dremio setup script for CI testing Co-Authored-By: Itamar Hartstein <[email protected]> * feat: add Trino, Dremio, Spark Docker services to docker-compose.yml Co-Authored-By: Itamar Hartstein <[email protected]> * fix: address DuckDB, Spark, and Dremio CI test failures - DuckDB: use file-backed DB path instead of :memory: to persist across subprocess calls, and reduce threads to 1 to avoid concurrent commit errors - Spark: install dbt-spark[PyHive] extras required for thrift connection method - Dremio: add dremio__target_database() dispatch override in e2e project to return target.database (upstream elementary package lacks this dispatch) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use Nessie catalog source for Dremio instead of plain S3 Plain S3 sources in Dremio do not support CREATE TABLE (needed for dbt seed). Switch to Nessie catalog source which supports table creation via Iceberg. Co-Authored-By: Itamar Hartstein <[email protected]> * feat: add seed caching for Docker-based adapters in CI - Make generate_data.py deterministic (fixed random seed) - Use fixed schema name for Docker adapters (ephemeral containers) - Cache seeded Docker volumes between runs using actions/cache - Cache DuckDB database file between runs - Skip dbt seed on cache hit, restoring from cached volumes instead - Applies to: Spark, Trino, Dremio, Postgres, ClickHouse, DuckDB Co-Authored-By: Itamar Hartstein <[email protected]> * fix: move seed cache restore before Docker service startup Addresses CodeRabbit review: restoring cached tarballs into Docker volumes while containers are already running risks data corruption. Now the cache key computation and volume restore happen before any Docker services are started, so containers initialise with the pre-seeded data. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: add docker-compose.yml to seed cache key and fail-fast readiness loops Co-Authored-By: Itamar Hartstein <[email protected]> * fix: convert ClickHouse bind mount to named Docker volume for seed caching Co-Authored-By: Itamar Hartstein <[email protected]> * ci: temporarily use dbt-data-reliability fix branch for Trino/Spark support Points to dbt-data-reliability#948 which adds: - trino__full_name_split (1-based array indexing) - trino__edr_get_create_table_as_sql (bypass model.config issue) - spark__edr_get_create_table_as_sql TODO: revert after dbt-data-reliability#948 is merged Co-Authored-By: Itamar Hartstein <[email protected]> * fix: stop Docker containers before archiving seed cache volumes Prevents tar race condition where ClickHouse temporary merge files disappear during archiving, causing 'No such file or directory' errors. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: add readiness wait after restarting Docker containers for seed cache After stopping containers for volume archiving and restarting them, services like Trino need time to reinitialize. Added per-adapter health checks to wait for readiness before proceeding. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use Trino starting:false check for proper readiness detection The /v1/info endpoint returns HTTP 200 even when Trino is still initializing. Check for '"starting":false' in the response body to ensure Trino is fully ready before proceeding. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: add Hive Metastore readiness check after container restart for Trino Co-Authored-By: Itamar Hartstein <[email protected]> * fix: Dremio CI - batched seed materialization, single-threaded seeding, skip seed cache for Trino - Skip seed caching for Trino (Hive Metastore doesn't recover from stop/start) - Remove dead Trino readiness code from seed cache restart section - Add batched Dremio seed materialization to handle large seeds (splits VALUES into 500-row batches) - Use --threads 1 for Dremio seed step to avoid Nessie catalog race conditions - Fix Dremio DROP SCHEMA cleanup macro (Dremio doesn't support DROP SCHEMA) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: Dremio CI - single-threaded dbt run/test, fix cross-schema seed refs, quote reserved words Co-Authored-By: Itamar Hartstein <[email protected]> * fix: revert reserved word quoting, use Dremio-specific expected failures in CI validation Co-Authored-By: Itamar Hartstein <[email protected]> * fix: rename reserved word columns (min/max/sum/one) to avoid Dremio SQL conflicts Co-Authored-By: Itamar Hartstein <[email protected]> * fix: always run seed step for all adapters (cloud adapters need fresh seeds after column rename) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: Dremio generate_schema_name - use default_schema instead of root_path to avoid double NessieSource prefix Co-Authored-By: Itamar Hartstein <[email protected]> * fix: Dremio - put seeds in default schema to avoid cross-schema reference issues in Nessie Co-Authored-By: Itamar Hartstein <[email protected]> * feat: external seed loading for Dremio and Spark via MinIO/CSV instead of slow dbt seed Co-Authored-By: Itamar Hartstein <[email protected]> * fix: format load_seeds_external.py with black, remove unused imports Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use --entrypoint /bin/sh for minio/mc docker container to enable shell commands Co-Authored-By: Itamar Hartstein <[email protected]> * refactor: extract external seeders into classes with click CLI Co-Authored-By: Itamar Hartstein <[email protected]> * fix: black/isort formatting in dremio.py Co-Authored-By: Itamar Hartstein <[email protected]> * fix: read Dremio credentials from dremio-setup.sh for external seeder Co-Authored-By: Itamar Hartstein <[email protected]> * fix: regex to handle escaped quotes in dremio-setup.sh for credential extraction Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use COPY INTO for Dremio seeds, skip Spark seed caching - Replace fragile CSV promotion REST API with COPY INTO for Dremio (creates Iceberg tables directly from S3 source files) - Remove _promote_csv and _refresh_source methods (no longer needed) - Skip seed caching for Spark (docker stop/start kills Thrift Server) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: add file_format delta for Spark models in e2e dbt_project.yml Co-Authored-By: Itamar Hartstein <[email protected]> * fix: Dremio S3 source - use compatibilityMode, rootPath=/, v3 Catalog API with retry Co-Authored-By: Itamar Hartstein <[email protected]> * fix: Dremio root_path double-nesting + Spark CLI file_format delta - Dremio: change root_path from 'NessieSource.schema' to just 'schema' to avoid double-nesting (NessieSource.NessieSource.schema.table) - Spark: add file_format delta to elementary CLI internal dbt_project.yml so edr monitor report works with merge incremental strategy Co-Authored-By: Itamar Hartstein <[email protected]> * fix: Dremio Space architecture - views in Space, seeds in Nessie datalake - Create elementary_ci Space in dremio-setup.sh for view materialization - Update profiles.yml.j2: database=elementary_ci (Space) for views - Delegate generate_schema_name to dbt-dremio native macro for correct root_path/schema resolution (datalake vs non-datalake nodes) - Update external seeder to place seeds at NessieSource.<root_path>.test_seeds - Update source definitions with Dremio-specific database/schema overrides Co-Authored-By: Itamar Hartstein <[email protected]> * style: apply black formatting to dremio.py Co-Authored-By: Itamar Hartstein <[email protected]> * fix: restore dremio.py credential extraction from dremio-setup.sh The _docker_defaults function was reading from docker-compose.yml dremio-setup environment section, but that section was reverted to avoid security scanner issues. Restore the regex-based extraction from dremio-setup.sh which has the literal credentials. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use enterprise_catalog_namespace for Dremio to avoid Nessie version context errors - Switch profiles.yml.j2 from separate datalake/root_path/database/schema to enterprise_catalog_namespace/enterprise_catalog_folder, keeping everything (tables + views) in the same Nessie source - Remove Dremio-specific delegation in generate_schema_name.sql (no longer needed) - Simplify schema.yml source overrides (no Dremio-specific database/schema) - Remove Space creation from dremio-setup.sh (views now go to Nessie) - Update external seeder path to match: NessieSource.test_seeds.<table> Co-Authored-By: Itamar Hartstein <[email protected]> * fix: restore dremio__generate_schema_name delegation for correct Nessie path resolution DremioRelation.quoted_by_component splits dots in schema names into separate quoted levels. With enterprise_catalog, dremio__generate_schema_name returns 'elementary_tests.test_seeds' for seeds, which renders as NessieSource."elementary_tests"."test_seeds"."table" (3-level path). - Restore Dremio delegation in generate_schema_name.sql - Revert seeder to 3-level Nessie path - Update schema.yml source overrides with dot-separated schema for Dremio Co-Authored-By: Itamar Hartstein <[email protected]> * fix: flatten Dremio seed schema to single-level Nessie namespace dbt-dremio skips folder creation for Nessie sources (database == datalake), and Dremio rejects folder creation inside SOURCEs. This means nested namespaces like NessieSource.elementary_tests.test_seeds can't be resolved. Fix: seeds return custom_schema_name directly (test_seeds) before Dremio delegation, producing flat NessieSource.test_seeds.<table> paths. Non-seed nodes still delegate to dremio__generate_schema_name for proper root_path. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: avoid typos pre-commit false positive on SOURCE plural Co-Authored-By: Itamar Hartstein <[email protected]> * fix: create Nessie namespace via REST API + refresh source metadata The Dremio view validator failed with 'Object test_seeds not found within NessieSource' because dbt-dremio skips folder creation for Nessie sources (database == credentials.datalake). Fix: 1. Create the Nessie namespace explicitly via Iceberg REST API before creating tables (tries /iceberg/main/v1/namespaces first, falls back to Nessie native API v2) 2. Refresh NessieSource metadata after seed loading so Dremio picks up the new namespace and tables Co-Authored-By: Itamar Hartstein <[email protected]> * style: apply black formatting to Nessie namespace methods Co-Authored-By: Itamar Hartstein <[email protected]> * fix: improve Nessie namespace creation + force Dremio catalog discovery Co-Authored-By: Itamar Hartstein <[email protected]> * fix: force NessieSource metadata re-scan via Catalog API policy update Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use USE BRANCH main for Dremio Nessie version context resolution Dremio's VDS (view) SQL validator requires explicit version context when referencing Nessie-backed objects. Without it, CREATE VIEW and other DDL fail with 'Version context must be specified using AT SQL syntax'. Fix: - Add on-run-start hook: USE BRANCH main IN <datalake> for Dremio targets - Set branch context in external seeder SQL session before table creation - Remove complex _force_metadata_refresh() that tried to work around the issue via Catalog API policy updates (didn't help because the VDS validator uses a separate code path from the Catalog API) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: put Dremio seeds in same Nessie namespace as models to fix VDS validator dbt-dremio uses stateless REST API where each SQL call is a separate HTTP request. USE BRANCH main does not persist across requests, so the VDS view validator cannot resolve cross-namespace Nessie references. Fix: place seed tables in the same namespace as model views (the target schema, e.g. elementary_tests) instead of a separate test_seeds namespace. This eliminates cross-namespace references in view SQL entirely. Changes: - generate_schema_name.sql: Dremio seeds return default_schema (same as models) - dremio.py: use self.schema_name instead of hardcoded test_seeds - schema.yml: source schemas use target.schema for Dremio - Remove broken on-run-start USE BRANCH hooks from both dbt_project.yml files Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use CREATE FOLDER + ALTER SOURCE REFRESH for Dremio metadata visibility The VDS view validator uses a separate metadata cache that doesn't immediately see tables created via the SQL API. Two fixes: 1. Replace Nessie REST API namespace creation (which failed in CI) with CREATE FOLDER SQL command through Dremio (more reliable) 2. After creating all seed tables, run ALTER SOURCE NessieSource REFRESH STATUS to force metadata cache refresh, then wait 10s for propagation before dbt run starts creating views Co-Authored-By: Itamar Hartstein <[email protected]> * fix: skip Docker restart for Dremio to preserve Nessie metadata cache After docker compose stop/start for seed caching, Dremio loses its in-memory metadata cache. The VDS view validator then cannot resolve Nessie-backed tables, causing all integration model views to fail. Since the Dremio external seeder with COPY INTO is already fast (~1 min), seed caching provides no meaningful benefit. Excluding Dremio from the Docker restart eliminates the metadata cache loss entirely. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: resolve Dremio edr monitor duplicate keys + exclude ephemeral model tests - Fix elementary profile: use enterprise_catalog_folder instead of schema for Dremio to avoid 'Got duplicate keys: (dremio_space_folder) all map to schema' - Exclude ephemeral_model tag from dbt test for Dremio (upstream dbt-dremio CTE limitation with __dbt__cte__ references) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: add continue-on-error for Dremio edr steps (dbt-core 1.11 compat) dbt-dremio installs dbt-core 1.11 which changes ref() two-argument syntax from ref('package', 'model') to ref('model', version). This breaks the elementary CLI's internal models. Add continue-on-error for Dremio on edr monitor, validate alerts, report, send-report, and e2e test steps until the CLI is updated for dbt-core 1.11 compatibility. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: revert temporary dbt-data-reliability branch pin (PR #948 merged) Co-Authored-By: Itamar Hartstein <[email protected]> * refactor: address PR review - ref syntax, healthchecks, external scripts - Fix dbt-core 1.11 compat: convert ref('elementary', 'model') to ref('model', package='elementary') - Remove continue-on-error for Dremio edr steps (root cause fixed) - Simplify workflow Start steps to use docker compose up -d --wait - Move seed cache save/restore to external ci/*.sh scripts - Fix schema quoting in drop_test_schemas.sql for duckdb and spark - Add non-root user to Spark Dockerfile - Remove unused dremio_seed.sql (seeds now load via external S3) Co-Authored-By: Itamar Hartstein <[email protected]> * refactor: parameterize Docker credentials via environment variables Co-Authored-By: Itamar Hartstein <[email protected]> * style: fix prettier formatting for docker-compose.yml healthchecks Co-Authored-By: Itamar Hartstein <[email protected]> * fix: increase Docker healthcheck timeouts for CI and fix Spark volume permissions Co-Authored-By: Itamar Hartstein <[email protected]> * fix: increase dremio-minio healthcheck retries to 60 with start_period for CI Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use bash TCP check for healthchecks (curl/nc missing in MinIO 2024 and hive images) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: align dremio-setup.sh default password with docker-compose (dremio123) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: resolve dremio.py credential extraction from shell variable defaults Co-Authored-By: Itamar Hartstein <[email protected]> * fix: increase hive-metastore healthcheck retries to 60 with 60s start_period for CI Co-Authored-By: Itamar Hartstein <[email protected]> * fix: wait for dremio-setup to complete before proceeding (use --exit-code-from) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: add nessie dependency to dremio-setup so NessieSource creation succeeds Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use ghcr.io registry for nessie image (no longer on Docker Hub) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: add continue-on-error for Dremio edr steps (dbt-core 1.11 ref() incompatibility) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: remove continue-on-error for Dremio edr steps (ref() override now on master) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use dot-separated Nessie namespace for Dremio elementary profile dbt-dremio's generate_schema_name uses dot separation for nested Nessie namespaces (e.g. elementary_tests.elementary), not underscore concatenation (elementary_tests_elementary). The CLI profile must match the namespace path created by the e2e project's dbt run. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: rename 'snapshots' CTE to avoid Dremio reserved keyword conflict Dremio's Calcite-based SQL parser treats 'snapshots' as a reserved keyword, causing 'Encountered ", snapshots" at line 6, column 6' error in the populate_model_alerts_query post-hook. Renamed to 'snapshots_data'. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: quote 'filter' column to avoid Dremio reserved keyword conflict Dremio's Calcite-based SQL parser treats 'filter' as a reserved keyword, causing 'Encountered ". filter" at line 52' error in the populate_source_freshness_alerts_query post-hook. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: make 'filter' column quoting Dremio-specific to avoid Snowflake case issue Snowflake stores columns as UPPERCASE, so quoting as "filter" (lowercase) breaks column resolution. Only quote for Dremio where it's a reserved keyword. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: override dbt-dremio dateadd to handle integer interval parameter dbt-dremio's dateadd macro calls interval.replace() which fails when interval is an integer. This override casts to string first. Upstream bug in dbt-dremio's macros/utils/date_spine.sql. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: remove 'select' prefix from dateadd override to avoid $SCALAR_QUERY error dbt-dremio's dateadd wraps result in 'select TIMESTAMPADD(...)' which creates a scalar subquery when embedded in larger SQL. Dremio's Calcite parser rejects multi-field RECORDTYPE in scalar subquery context. Output just TIMESTAMPADD(...) as a plain expression instead. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: strip Z timezone suffix from Dremio timestamps to avoid GandivaException Dremio's Gandiva (Arrow execution engine) cannot parse ISO 8601 timestamps with the 'Z' UTC timezone suffix (e.g. '2026-03-02T22:50:42.101Z'). This causes 'Invalid timestamp or unknown zone' errors during edr monitor report. Override dremio__edr_cast_as_timestamp in the monitor project to strip the 'Z' suffix before casting. Also add dispatch config so elementary_cli macros take priority over the elementary package for adapter-dispatched macros. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use double quotes in dbt_project.yml for prettier compatibility Co-Authored-By: Itamar Hartstein <[email protected]> * fix: also replace T separator with space in Dremio timestamp cast Gandiva rejects both 'Z' suffix and 'T' separator in ISO 8601 timestamps. Normalize '2026-03-02T23:31:12.443Z' to '2026-03-02 23:31:12.443'. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: use targeted regex for T separator to avoid replacing T in non-timestamp text Co-Authored-By: Itamar Hartstein <[email protected]> * fix: quote 'filter' reserved keyword in get_source_freshness_results for Dremio Co-Authored-By: Itamar Hartstein <[email protected]> * fix: quote Dremio reserved keywords row_number and count in SQL aliases Dremio's Calcite SQL parser reserves ROW_NUMBER and COUNT as keywords. These were used as unquoted column aliases in: - get_models_latest_invocation.sql - get_models_latest_invocations_data.sql - can_upload_source_freshness.sql Applied Dremio-specific double-quoting via target.type conditional, same pattern used for 'filter' and 'snapshots' reserved keywords. Co-Authored-By: Itamar Hartstein <[email protected]> * refactor: use elementary.escape_reserved_keywords() for Dremio reserved words Replace manual {% if target.type == 'dremio' %} quoting with the existing elementary.escape_reserved_keywords() utility from dbt-data-reliability. Files updated: - get_models_latest_invocation.sql: row_number alias - get_models_latest_invocations_data.sql: row_number alias - can_upload_source_freshness.sql: count alias - source_freshness_alerts.sql: filter column reference - get_source_freshness_results.sql: filter column reference Also temporarily pins dbt-data-reliability to branch with row_number and snapshots added to the reserved keywords list (PR #955). Co-Authored-By: Itamar Hartstein <[email protected]> * chore: revert temporary dbt-data-reliability branch pin (PR #955 merged) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: resolve 'Column unique_id is ambiguous' error in Dremio joins Replace USING (unique_id) with explicit ON clause and select specific columns instead of SELECT * to avoid ambiguous column references in Dremio's SQL engine, which doesn't deduplicate join columns with USING. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: qualify invocation_id column reference to resolve ambiguity in ON join The switch from USING to ON for Dremio compatibility requires qualifying column references since ON doesn't deduplicate join columns like USING does. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: address CodeRabbit review comments - Revert ref() syntax from package= keyword to positional form in 20 monitor macros - Add HTTP_TIMEOUT constant and apply to all 7 requests calls in dremio.py - Raise RuntimeError on S3 source creation failure instead of silent print - Aggregate and raise failures in dremio.py and spark.py load() methods - Fix shell=True injection: convert base.py run() to list-based subprocess - Quote MinIO credentials with shlex.quote() in dremio.py - Add backtick-escaping helper _q() for Spark SQL identifiers - Fail fast on readiness timeout in save_seed_cache.sh - Convert EXTRA_ARGS to bash array in test-warehouse.yml (SC2086) - Remove continue-on-error from dbt test step - Add explicit day case in dateadd.sql override - Document Spark schema_name limitation in load_seeds_external.py Co-Authored-By: Itamar Hartstein <[email protected]> * style: fix black formatting in dremio.py and spark.py Co-Authored-By: Itamar Hartstein <[email protected]> * fix: address CodeRabbit bugs - 409 fallback and stale empty tables Co-Authored-By: Itamar Hartstein <[email protected]> * fix: address remaining CodeRabbit CI comments Co-Authored-By: Itamar Hartstein <[email protected]> * fix: clarify Spark seeder pyhive dependency Co-Authored-By: Itamar Hartstein <[email protected]> * fix: address CodeRabbit review round 3 - cleanup and hardening - test-warehouse.yml: replace for-loop with case statement for Docker adapter check - dateadd.sql: use bare TIMESTAMPADD keywords instead of SQL_TSI_* constants, add case-insensitive datepart matching - spark.py: harden connection cleanup with None-init + conditional close, escape single quotes in container_path - dremio.py: switch from PyYAML to ruamel.yaml for project consistency, log non-file parsing failures, make seeding idempotent with DROP TABLE before CREATE TABLE Co-Authored-By: Itamar Hartstein <[email protected]> * fix: correct isort import order in dremio.py Co-Authored-By: Itamar Hartstein <[email protected]> * fix: restore continue-on-error on dbt test step (many e2e tests are designed to fail) The e2e project has tests tagged error_test and should_fail that are intentionally designed to fail. The dbt test step needs continue-on-error so these expected failures don't block the CI job. The edr monitoring steps that follow validate the expected outcomes. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: remove Dremio dateadd and cast_column overrides now handled by dbt-data-reliability Co-Authored-By: Itamar Hartstein <[email protected]> * fix: remove dremio_target_database override now handled by dbt-data-reliability Co-Authored-By: Itamar Hartstein <[email protected]> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <[email protected]>
* feat: add Vertica adapter support - Add dbt-vertica as optional dependency in pyproject.toml - Add Vertica transient error patterns for automatic retry - Add Vertica Docker service to e2e test docker-compose - Add Vertica profile to test profiles template - Add Vertica to CI workflow matrix (test-warehouse + test-all-warehouses) - Handle dbt-vertica install with --no-deps (pins dbt-core~=1.8) Co-Authored-By: Itamar Hartstein <[email protected]> * fix: prevent dbt-core downgrade when installing elementary with vertica The dbt-vertica package pins dbt-core~=1.8. Installing elementary with the [vertica] extra would re-resolve dbt-vertica's deps and downgrade dbt-core, breaking package-lock.yml parsing and protobuf compat. Fix: install elementary without the adapter extra for Vertica (dbt-vertica is already present via --no-deps), and explicitly install vertica-python. Co-Authored-By: Itamar Hartstein <[email protected]> * fix: address CodeRabbit review feedback - Remove 'no transaction in progress' from Vertica transient patterns (it's a transaction-state notice, not a transient connection error) - Hardcode dbt-data-reliability-ref to 'vertica-compat' for the Vertica matrix entry until PR #963 merges into master Co-Authored-By: Itamar Hartstein <[email protected]> * chore: update dbt-data-reliability ref to merged master commit PR #963 has been merged to master (2ab66fbe). Update: - packages.yml/package-lock.yml: point to merge commit on master - test-all-warehouses.yml: remove vertica-compat branch override Co-Authored-By: Itamar Hartstein <[email protected]> * fix: add vertica_seed_override to e2e project (fixes seed_rejects collision) dbt-vertica's COPY command hardcodes 'seed_rejects' as the reject table name for every seed, causing 'Object already exists' errors when dbt seed processes more than one file. This override (same as in dbt-data-reliability integration tests) uses per-seed reject table names. Co-Authored-By: Itamar Hartstein <[email protected]> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <[email protected]>
- Add fabricspark__get_adapter_unique_id macro returning target.workspaceid - Include fabricspark in delta file format config in dbt_project.yml - Add fabricspark transient error patterns (Livy API HTTP errors) - Add dbt-fabricspark optional dependency and fabricspark extra in pyproject.toml Closes ELE-5296 Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <[email protected]>
…ow (#2144) In _parse_test_db_row, latest_run_time was set to the raw UTC datetime and latest_run_time_utc was converted to local time, which is backwards. This swaps them to match the correct pattern in _get_test_metadata_from_test_result_db_row where latest_run_time is local time and latest_run_time_utc is UTC. Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <[email protected]>
…pecified (#2143) When using 'edr report --select invocation_id:XXXX', the report showed 0 tests for non-latest invocations because the invocations_rank_index == 1 filter ran unconditionally after the invocation_id filter, removing all results where the selected invocation wasn't the most recent one. This mirrors the existing correct pattern in get_test_results_summary(). Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <[email protected]>
* release v0.23.0 * use official package version 0.23.0 * update package lock --------- Co-authored-by: GitHub Actions <[email protected]> Co-authored-by: Itamar Hartstein <[email protected]>
…2146) * Fix release validation to parse YAML instead of grepping comments The validate-packages-yml job was using grep to check for git hash references in packages.yml, which also matched commented-out lines. This caused the release to fail even though the actual YAML config was correct. Replace the grep-based validation with a Python script that properly parses the YAML, ignoring comments and only checking actual package entries. Co-Authored-By: Itamar Hartstein <[email protected]> * Add workflow_dispatch trigger to release workflow Allows manually triggering the release workflow from the GitHub Actions UI without needing to modify or recreate a tag. Co-Authored-By: Itamar Hartstein <[email protected]> * Add tag input to workflow_dispatch for manual releases When manually triggering the release workflow, a tag input is now required (e.g. v0.23.0). All checkout steps use this tag as the ref, and Docker metadata uses it for proper image tagging. When triggered by a tag push, the behavior is unchanged (falls back to github.ref). Co-Authored-By: Itamar Hartstein <[email protected]> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <[email protected]>
|
👋 @joostboon |
📝 WalkthroughWalkthroughNew and revised documentation for the PR / MR Data Quality Review feature: adds a cloud-specific guide and replaces the previous CI doc with an end-to-end walkthrough covering how the review is produced, prerequisites, GitHub Actions and GitLab CI setup examples, required secrets/variables, and troubleshooting. (49 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer
participant Repo as Git Hosting (GitHub/GitLab)
participant CI as CI (GitHub Actions / GitLab CI)
participant MCP as Elementary Cloud MCP
participant Model as LLM (Anthropic / Claude)
participant API as Elementary API
Dev->>Repo: Open PR / MR
Repo->>CI: Trigger workflow (diff, tests, metadata)
CI->>MCP: Send diff, test history, repo metadata
MCP->>Model: Request analysis (Anthropic)
Model-->>MCP: Return data-quality context
MCP->>API: Fetch model/repo health & downstream impact
API-->>MCP: Return summaries
MCP->>CI: Provide composed PR comment content
CI->>Repo: Post PR/MR comment with data-quality review
Repo-->>Dev: Display comment on PR/MR
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
docs/features/ci.mdx (1)
2-137: Consider making one CI review page canonical.This file now mirrors
docs/cloud/features/ci.mdxalmost verbatim. Keeping the setup snippets and troubleshooting text in two places will make future fixes like these easy to miss; a redirect, alias, or shared MDX partial would reduce drift.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/features/ci.mdx` around lines 2 - 137, This page duplicates docs/cloud/features/ci.mdx; make one canonical and update the other to reference it: pick either docs/features/ci.mdx or docs/cloud/features/ci.mdx as the source of truth, remove duplicated content from the non-canonical file (including the title "PR / MR Data Quality Review", the GitHub Actions and GitLab CI snippets, the Accordion block, and Troubleshooting/Warning sections), and replace it with a short MDX redirect/alias or an import that points to the canonical document (or a shared MDX partial/component for the repeated sections) so future edits to the CI setup live in one place. Ensure links and frontmatter remain consistent and that the chosen canonical file contains the full content for "PR / MR Data Quality Review" and the setup examples.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/cloud/features/ci.mdx`:
- Around line 83-87: The docs table currently lists the default diff-filter as
`ACM`, which omits renamed files; update the `diff-filter` default to `ACMR` to
include renames, and also add a short parenthetical note in the Description
column for `diff-filter` (or an adjacent sentence) clarifying that including `R`
will capture renamed models (A=Added, C=Copied, M=Modified, R=Renamed) so
renames are not silently omitted; target the `diff-filter` table row and the
surrounding description text.
- Around line 70-79: Add a troubleshooting caveat to docs/cloud/features/ci.mdx
explaining that the setup relies on repository secrets (ANTHROPIC_API_KEY,
ELEMENTARY_API_KEY) in a pull_request workflow and that GitHub does not expose
these secrets to workflows triggered by forks or Dependabot (and GITHUB_TOKEN is
read-only in those contexts); briefly list mitigations such as asking
maintainers to run the workflow manually, using workflow_dispatch or
repository_dispatch for external contributions, or instructing contributors to
open PRs from branches in the main repo, and mention Dependabot as a special
case so external PRs don’t fail silently.
- Around line 57-60: Update the workflow permissions block to include contents:
read in addition to pull-requests: write so actions/checkout@v4 can access the
repo; locate the YAML permissions block under the job (the existing
"permissions:" with "pull-requests: write") and add "contents: read". Also
update the troubleshooting note that currently mentions pull-requests: write to
state that both pull-requests: write (for posting comments) and contents: read
(for checking out code) are required.
In `@docs/features/ci.mdx`:
- Around line 70-79: Update the sentence "Open a PR that touches a model file
and the review comment appears automatically." to add a caveat explaining that
the workflow uses the pull_request event and GitHub hides repository secrets for
PRs from forks and Dependabot; specify that the automatic comment will only
appear for PRs originating from branches within the same repository (not forks
or Dependabot PRs), and optionally add the same note to the Troubleshooting
section for visibility.
- Around line 57-60: The permissions block currently sets unlisted scopes to
none and omits repository read access required by actions/checkout@v4; add
"contents: read" to the permissions block (alongside pull-requests: write) so
checkout can run, and update the troubleshooting note referenced near line ~125
to explicitly mention adding "contents: read" as a required permission when
diagnosing job failures; locate the YAML permissions block and the
troubleshooting paragraph text to apply these updates.
---
Nitpick comments:
In `@docs/features/ci.mdx`:
- Around line 2-137: This page duplicates docs/cloud/features/ci.mdx; make one
canonical and update the other to reference it: pick either docs/features/ci.mdx
or docs/cloud/features/ci.mdx as the source of truth, remove duplicated content
from the non-canonical file (including the title "PR / MR Data Quality Review",
the GitHub Actions and GitLab CI snippets, the Accordion block, and
Troubleshooting/Warning sections), and replace it with a short MDX
redirect/alias or an import that points to the canonical document (or a shared
MDX partial/component for the repeated sections) so future edits to the CI setup
live in one place. Ensure links and frontmatter remain consistent and that the
chosen canonical file contains the full content for "PR / MR Data Quality
Review" and the setup examples.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0b4b9c18-4c56-4ab5-9785-5db6aa1135c4
📒 Files selected for processing (2)
docs/cloud/features/ci.mdxdocs/features/ci.mdx
| **Step 2 — Add two repository secrets** | ||
|
|
||
| Go to **Settings > Secrets and variables > Actions** and add: | ||
|
|
||
| | Secret | Description | | ||
| |---|---| | ||
| | `ANTHROPIC_API_KEY` | Your Anthropic API key | | ||
| | `ELEMENTARY_API_KEY` | Your Elementary Cloud API key | | ||
|
|
||
| That's it. Open a PR that touches a model file and the review comment appears automatically. |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GitHub Actions repository secrets pull_request workflow forked repos Dependabot
💡 Result:
pull_requestfrom a fork: GitHub Actions does not provide repository/organization secrets to workflows triggered by PRs opened from forked repositories, and theGITHUB_TOKENis read-only (to prevent secret exfiltration). [1]- Dependabot PRs are treated like forks: Workflows triggered by Dependabot on events like
push/pull_requestare treated as if they came from a fork, so they get a read-onlyGITHUB_TOKENand no normal Actions secrets. [1] - Use “Dependabot secrets” for Dependabot-triggered runs: If Dependabot needs credentials (e.g., registry tokens), store them under Repo → Settings → Security → Dependabot → Secrets (or org-level Dependabot secrets). Those are the secrets intended for workflows initiated by Dependabot. [2][3][4]
pull_request_targetis risky (and doesn’t “fix” Dependabot secret access):pull_request_targetruns in the base repo context and can expose secrets if you check out / execute untrusted PR code; GitHub explicitly warns about this trigger. Also, for Dependabot, evenpull_request_targetruns still have read-only token and no secrets in the Dependabot-PR case. [5][3]- Common workaround pattern: Keep PR checks that don’t need secrets on
pull_request, and run any secret-requiring steps in a separate, gated workflow (often viaworkflow_run) or by using Dependabot secrets with the same secret names your workflow expects. [1]
Sources:
[1] GitHub Docs – Troubleshooting Dependabot on GitHub Actions
[2] GitHub Docs – Automating Dependabot with GitHub Actions
[3] GitHub Docs – Dependabot on GitHub Actions
[4] GitHub Docs – Configuring access to private registries for Dependabot
[5] GitHub Docs – Using GitHub’s security features to secure your use of GitHub Actions
🏁 Script executed:
# Check if the file exists and review the content around lines 70-79
fd "ci.mdx" docs/Repository: elementary-data/elementary
Length of output: 169
🏁 Script executed:
# Read the file to see the context around lines 70-79
sed -n '60,90p' docs/cloud/features/ci.mdxRepository: elementary-data/elementary
Length of output: 1114
🏁 Script executed:
# Check the full structure of the file to see if there's a troubleshooting section
cat -n docs/cloud/features/ci.mdx | head -150Repository: elementary-data/elementary
Length of output: 5743
Add a fork/Dependabot caveat in the Troubleshooting section.
The setup depends on repository secrets inside a pull_request workflow, but GitHub does not pass those secrets to PRs from forked repositories or Dependabot, and the GITHUB_TOKEN is read-only in those contexts. External contributors and Dependabot runs will encounter a broken setup without this caveat.
Suggested addition to Troubleshooting
**The comment says the MCP server is unreachable**
Verify `ELEMENTARY_API_KEY` is correctly set and the MCP server is enabled for your account. See the [MCP setup guide](/cloud/mcp/setup-guide).
+**Setup doesn't work for forks or Dependabot PRs**
+
+<Warning>
+This workflow works for pull requests opened from branches in the same repository. GitHub does not pass repository secrets to `pull_request` workflows from forks or Dependabot PRs.
+</Warning>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/cloud/features/ci.mdx` around lines 70 - 79, Add a troubleshooting
caveat to docs/cloud/features/ci.mdx explaining that the setup relies on
repository secrets (ANTHROPIC_API_KEY, ELEMENTARY_API_KEY) in a pull_request
workflow and that GitHub does not expose these secrets to workflows triggered by
forks or Dependabot (and GITHUB_TOKEN is read-only in those contexts); briefly
list mitigations such as asking maintainers to run the workflow manually, using
workflow_dispatch or repository_dispatch for external contributions, or
instructing contributors to open PRs from branches in the main repo, and mention
Dependabot as a special case so external PRs don’t fail silently.
| permissions: | ||
| pull-requests: write | ||
| steps: | ||
| - uses: actions/checkout@v4 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "ci.mdx" -type f | head -20Repository: elementary-data/elementary
Length of output: 121
🏁 Script executed:
wc -l docs/features/ci.mdx 2>/dev/null || echo "File not found"Repository: elementary-data/elementary
Length of output: 94
🏁 Script executed:
sed -n '50,80p' docs/features/ci.mdx 2>/dev/null || echo "File not found"Repository: elementary-data/elementary
Length of output: 865
🏁 Script executed:
sed -n '120,130p' docs/features/ci.mdx 2>/dev/null || echo "File not found"Repository: elementary-data/elementary
Length of output: 433
🌐 Web query:
GitHub Actions permissions workflow scope behavior explicit permissions block
💡 Result:
-
permissions:controls theGITHUB_TOKENscopes for each job in a workflow, and can be set:- Top-level (
permissions:) → becomes the default for all jobs in that workflow. - Per-job (
jobs.<job_id>.permissions:) → overrides the workflow-level setting for that specific job. [1]
- Top-level (
-
How the effective permissions are calculated (order matters):
- Start from the enterprise/org/repo default “Workflow permissions” setting for
GITHUB_TOKEN. - Apply any workflow-level
permissions:in the YAML. - Apply any job-level
permissions:in the YAML. - If triggered by a fork PR and “Send write tokens to workflows from pull requests” is not enabled, any write permissions are downgraded to read. [1]
- Start from the enterprise/org/repo default “Workflow permissions” setting for
-
Key “gotcha” with an explicit permissions block:
If you set any permission explicitly, then any permission not listed is set tonone(so adding a smallpermissions:block can unintentionally remove other scopes you were relying on). [1] -
Useful shorthands:
permissions: read-allpermissions: write-allpermissions: {}(disable all) [1]
-
Recommended practice: set the repo/org default to least privilege (often read-only) and then grant only what a given job needs. [2]
Sources: GitHub Actions workflow syntax (permissions + calculation order) [1], GitHub security hardening guidance [2].
🌐 Web query:
actions/checkout requires contents read permission GitHub Actions
💡 Result:
actions/checkout uses the job’s GITHUB_TOKEN to fetch (read) your repository, so the token must have contents: read permission (unless you provide a different auth via the action’s token or ssh-key inputs). [1]
Add this at the workflow or job level:
permissions:
contents: read
steps:
- uses: actions/checkout@v4If you’ve locked permissions down (anything not listed becomes none), explicitly granting contents: read is required. [2]
Also check the repo/org setting Settings → Actions → General → Workflow permissions (restricted mode gives GITHUB_TOKEN read access to contents and packages; permissive mode is read/write for more scopes). [3]
Sources: [1] [2] [3]
Add contents: read to the job permissions.
Line 57 introduces an explicit permissions block, which sets any unlisted scope to none. This removes the repository-read access that actions/checkout@v4 requires, causing the job to fail before the Elementary step runs. Update the troubleshooting note (around line 125) to mention contents: read as well, since it's equally critical.
🛠️ Proposed fix
permissions:
+ contents: read
pull-requests: write📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| permissions: | |
| pull-requests: write | |
| steps: | |
| - uses: actions/checkout@v4 | |
| permissions: | |
| contents: read | |
| pull-requests: write | |
| steps: | |
| - uses: actions/checkout@v4 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/features/ci.mdx` around lines 57 - 60, The permissions block currently
sets unlisted scopes to none and omits repository read access required by
actions/checkout@v4; add "contents: read" to the permissions block (alongside
pull-requests: write) so checkout can run, and update the troubleshooting note
referenced near line ~125 to explicitly mention adding "contents: read" as a
required permission when diagnosing job failures; locate the YAML permissions
block and the troubleshooting paragraph text to apply these updates.
| **Step 2 — Add two repository secrets** | ||
|
|
||
| Go to **Settings > Secrets and variables > Actions** and add: | ||
|
|
||
| | Secret | Description | | ||
| |---|---| | ||
| | `ANTHROPIC_API_KEY` | Your Anthropic API key | | ||
| | `ELEMENTARY_API_KEY` | Your Elementary Cloud API key | | ||
|
|
||
| That's it. Open a PR that touches a model file and the review comment appears automatically. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and inspect the file mentioned
fd "ci.mdx" -t fRepository: elementary-data/elementary
Length of output: 169
🌐 Web query:
GitHub Actions secrets behavior with forked pull requests and Dependabot
💡 Result:
Forked pull requests (pull_request)
- Workflows triggered by
pull_requestfrom forks run with no repository secrets (except the auto-mintedGITHUB_TOKEN) and thatGITHUB_TOKENis read-only/minimal privileges. This is a deliberate security guarantee to prevent exfiltration of secrets by untrusted PR code. [1]
pull_request_target (often used to intentionally access secrets on PRs)
pull_request_targetruns in the context of the base repository, so it can access repository secrets, including when the PR comes from a fork—this is powerful but risky if you check out / run untrusted PR code. [2]- GitHub also updated
pull_request_targetso the workflow file/checkout reference comes from the repo’s default branch, but it remains higher-risk specifically because it executes with access to secrets for fork PRs. [3]
Dependabot-triggered workflows and secrets
- If a workflow is triggered by Dependabot, GitHub supports Dependabot secrets (configured separately from Actions secrets) and they appear in the workflow via the normal
secrets.*context. A common pattern is to create both an Actions secret and a Dependabot secret with the same name, so the workflow works for both Dependabot and humans. [4] - Special restriction: for workflows initiated by Dependabot using
pull_request_target, if the PR/base ref was created by Dependabot, then secrets are not available and theGITHUB_TOKENis read-only. [5] - Additionally, some event types triggered by Dependabot are hard-limited to read-only token and no secrets (per GitHub’s changelog). [6]
Practical takeaway
- Use
pull_requestfor CI that runs PR code (safe-by-default: no secrets). [1] - If you need secrets for PR-related automation (labeling/commenting, etc.), use
pull_request_targetbut avoid executing untrusted PR code; otherwise you risk secret exfiltration. [2][3] - For Dependabot, prefer Dependabot secrets (or dual secrets with same name) and expect stricter limits, especially with
pull_request_target. [4][5]
Sources: [1] GitHub Security blog guidance on fork PR privilege limits; [2] GitHub changelog/discussion on pull_request_target secret exposure risk; [3] GitHub changelog (Nov 7, 2025) on pull_request_target behavior; [4] GitHub Docs: Troubleshooting Dependabot on GitHub Actions; [5] GitHub Docs: Dependabot on GitHub Actions; [6] GitHub Changelog (Dec 9, 2021) Dependabot-triggered workflow permission changes.
🏁 Script executed:
# Let's inspect the file to see the actual content and check the workflow event trigger
head -120 docs/features/ci.mdx | tail -60Repository: elementary-data/elementary
Length of output: 1942
🏁 Script executed:
# Also check the entire file to understand context and find troubleshooting section
wc -l docs/features/ci.mdxRepository: elementary-data/elementary
Length of output: 94
🏁 Script executed:
# Let's get the exact lines 70-79 that the review comment references
sed -n '70,79p' docs/features/ci.mdxRepository: elementary-data/elementary
Length of output: 408
🏁 Script executed:
# Also check a wider range to understand the workflow trigger and look for troubleshooting section
sed -n '1,80p' docs/features/ci.mdxRepository: elementary-data/elementary
Length of output: 2734
🏁 Script executed:
# Check the rest of the file to see if there's a troubleshooting section
sed -n '80,137p' docs/features/ci.mdxRepository: elementary-data/elementary
Length of output: 2122
Add a fork/Dependabot caveat before promising automatic comments.
The workflow shown uses pull_request events, where GitHub deliberately withholds repository secrets from forked and Dependabot PRs as a security measure. The current text "Open a PR that touches a model file and the review comment appears automatically" will fail silently for external contributors without this critical context. Add a note either immediately after that line or in the Troubleshooting section clarifying that this workflow requires the PR to originate from a branch in the same repository, not from a fork or Dependabot.
🛠️ Suggested note
That's it. Open a PR that touches a model file and the review comment appears automatically.
+
+<Warning>
+This workflow works for pull requests opened from branches in the same repository. GitHub does not pass repository secrets to `pull_request` workflows from forks or Dependabot PRs.
+</Warning>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Step 2 — Add two repository secrets** | |
| Go to **Settings > Secrets and variables > Actions** and add: | |
| | Secret | Description | | |
| |---|---| | |
| | `ANTHROPIC_API_KEY` | Your Anthropic API key | | |
| | `ELEMENTARY_API_KEY` | Your Elementary Cloud API key | | |
| That's it. Open a PR that touches a model file and the review comment appears automatically. | |
| **Step 2 — Add two repository secrets** | |
| Go to **Settings > Secrets and variables > Actions** and add: | |
| | Secret | Description | | |
| |---|---| | |
| | `ANTHROPIC_API_KEY` | Your Anthropic API key | | |
| | `ELEMENTARY_API_KEY` | Your Elementary Cloud API key | | |
| That's it. Open a PR that touches a model file and the review comment appears automatically. | |
| <Warning> | |
| This workflow works for pull requests opened from branches in the same repository. GitHub does not pass repository secrets to `pull_request` workflows from forks or Dependabot PRs. | |
| </Warning> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/features/ci.mdx` around lines 70 - 79, Update the sentence "Open a PR
that touches a model file and the review comment appears automatically." to add
a caveat explaining that the workflow uses the pull_request event and GitHub
hides repository secrets for PRs from forks and Dependabot; specify that the
automatic comment will only appear for PRs originating from branches within the
same repository (not forks or Dependabot PRs), and optionally add the same note
to the Troubleshooting section for visibility.
There was a problem hiding this comment.
♻️ Duplicate comments (3)
docs/cloud/features/ci.mdx (3)
94-94:⚠️ Potential issue | 🟡 MinorInclude renames in the default
diff-filter(or explicitly document exclusion).
ACMomits renamed files, so model renames can be missed from review context.Proposed fix
-| `diff-filter` | `ACM` | File changes to include: A=Added, C=Copied, M=Modified | +| `diff-filter` | `ACMR` | File changes to include: A=Added, C=Copied, M=Modified, R=Renamed |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/cloud/features/ci.mdx` at line 94, The table entry for `diff-filter` currently lists `ACM` which omits renames; update the default to include rename indicator `R` (e.g., `ACMR`) or explicitly document that renames are excluded and must be handled separately; locate the `diff-filter` row in docs/cloud/features/ci.mdx (the table line containing `diff-filter` and `ACM`) and either add `R` to the allowed values or add a clarifying sentence explaining that renamed files are not included by `ACM` and how to include them.
65-66:⚠️ Potential issue | 🟠 MajorAdd
contents: readand document both required permissions.With an explicit
permissionsblock, unspecified scopes are set to none. Keeping onlypull-requests: writecan breakactions/checkout@v4. Also, the troubleshooting text should mention both required scopes.Proposed fix
permissions: + contents: read pull-requests: write-For GitHub, make sure `pull-requests: write` is set under `permissions` in the workflow file. GitHub Actions requires this explicitly to post PR comments. +For GitHub, make sure both `contents: read` and `pull-requests: write` are set under `permissions` in the workflow file. `contents: read` is required for checkout, and `pull-requests: write` is required to post PR comments.Also applies to: 133-134
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/cloud/features/ci.mdx` around lines 65 - 66, The permissions block currently lists only "pull-requests: write", which leaves all other scopes unset and can break actions/checkout@v4; update the permissions block in docs/cloud/features/ci.mdx to include both "contents: read" and "pull-requests: write" (and update the troubleshooting text to mention both required scopes) so readers see the complete required permissions; ensure you modify the permissions block instances referenced (the block around the original diff and the one at lines ~133-134) and update any nearby troubleshooting paragraph to explicitly name both scopes.
87-87:⚠️ Potential issue | 🟠 MajorDocument fork/Dependabot limitations for secret-based PR workflows.
The setup depends on repository secrets in a
pull_requestworkflow. Forked PRs and Dependabot PRs typically won’t receive those secrets, so this should be called out explicitly in troubleshooting.Suggested addition
**The comment says the MCP server is unreachable** Verify `ELEMENTARY_API_KEY` is correctly set and the MCP server is enabled for your account. See the [MCP setup guide](/cloud/mcp/setup-guide). +**Setup doesn't work for forked PRs or Dependabot** + +<Warning> +For `pull_request` workflows, GitHub does not expose repository secrets to forked PRs, and Dependabot PRs are similarly restricted. If needed, use a maintainer-triggered workflow (for example via `workflow_dispatch`) or run this only for branches in the main repository. +</Warning> + <Warning> If a model has never been synced through Elementary, the comment will note that no history is available yet. Results populate automatically after the next Elementary sync. </Warning>Also applies to: 131-142
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/cloud/features/ci.mdx` at line 87, Add an explicit troubleshooting note to the CI docs where the sentence "That's it. Open a PR that touches a model file and the review comment appears automatically." appears: explain that the setup relies on repository secrets available to pull_request workflows and therefore forked PRs and Dependabot PRs typically do not receive those secrets, include suggested workarounds (e.g., contributors open branch PRs from same repo, maintainers run workflow, or use a bot with push access), and add the same clarification to the other affected block referenced in the comment (the section covering lines 131-142) so readers know the limitation and how to proceed.
🧹 Nitpick comments (1)
docs/cloud/features/ci.mdx (1)
37-37: Clarify GitHub vs GitLab setup count in this sentence.“Two secrets and one file” reads as global, but the GitLab flow later requires three variables. Consider wording this as GitHub-specific.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/cloud/features/ci.mdx` at line 37, The sentence "Two secrets and one file" is ambiguous because the GitLab flow requires three variables; update the copy to state the GitHub-specific requirement (e.g., "For GitHub: two secrets and one file") or explicitly list both platform counts so readers aren't misled; adjust the surrounding text that references GitLab variables to ensure it says "GitLab: three variables" or similar so the total requirements are clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/cloud/features/ci.mdx`:
- Line 94: The table entry for `diff-filter` currently lists `ACM` which omits
renames; update the default to include rename indicator `R` (e.g., `ACMR`) or
explicitly document that renames are excluded and must be handled separately;
locate the `diff-filter` row in docs/cloud/features/ci.mdx (the table line
containing `diff-filter` and `ACM`) and either add `R` to the allowed values or
add a clarifying sentence explaining that renamed files are not included by
`ACM` and how to include them.
- Around line 65-66: The permissions block currently lists only "pull-requests:
write", which leaves all other scopes unset and can break actions/checkout@v4;
update the permissions block in docs/cloud/features/ci.mdx to include both
"contents: read" and "pull-requests: write" (and update the troubleshooting text
to mention both required scopes) so readers see the complete required
permissions; ensure you modify the permissions block instances referenced (the
block around the original diff and the one at lines ~133-134) and update any
nearby troubleshooting paragraph to explicitly name both scopes.
- Line 87: Add an explicit troubleshooting note to the CI docs where the
sentence "That's it. Open a PR that touches a model file and the review comment
appears automatically." appears: explain that the setup relies on repository
secrets available to pull_request workflows and therefore forked PRs and
Dependabot PRs typically do not receive those secrets, include suggested
workarounds (e.g., contributors open branch PRs from same repo, maintainers run
workflow, or use a bot with push access), and add the same clarification to the
other affected block referenced in the comment (the section covering lines
131-142) so readers know the limitation and how to proceed.
---
Nitpick comments:
In `@docs/cloud/features/ci.mdx`:
- Line 37: The sentence "Two secrets and one file" is ambiguous because the
GitLab flow requires three variables; update the copy to state the
GitHub-specific requirement (e.g., "For GitHub: two secrets and one file") or
explicitly list both platform counts so readers aren't misled; adjust the
surrounding text that references GitLab variables to ensure it says "GitLab:
three variables" or similar so the total requirements are clear.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0c4bb0e6-07d6-4a77-a89e-4c8febca7e63
📒 Files selected for processing (1)
docs/cloud/features/ci.mdx
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/cloud/features/ci.mdx`:
- Around line 57-59: The pull_request workflow paths only include
"models/**/*.yml" so changes to ".yaml" files won't trigger reviews; update the
pull_request.paths configuration to also include the "models/**/*.yaml" pattern
(alongside the existing "models/**/*.yml") so repositories using .yaml extension
are covered; edit the paths section in the CI config where the two entries
"models/**/*.sql" and "models/**/*.yml" are declared and add "models/**/*.yaml".
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 24803a98-7d41-4001-838f-0c7e95c3b2af
📒 Files selected for processing (1)
docs/cloud/features/ci.mdx
| paths: | ||
| - "models/**/*.sql" | ||
| - "models/**/*.yml" |
There was a problem hiding this comment.
Include .yaml in pull_request.paths to avoid missed reviews.
The trigger currently matches models/**/*.yml but not models/**/*.yaml. Repos using .yaml model files will not run this workflow when those files change.
Suggested fix
on:
pull_request:
paths:
- "models/**/*.sql"
- "models/**/*.yml"
+ - "models/**/*.yaml"
- "dbt_project.yml"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| paths: | |
| - "models/**/*.sql" | |
| - "models/**/*.yml" | |
| paths: | |
| - "models/**/*.sql" | |
| - "models/**/*.yml" | |
| - "models/**/*.yaml" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/cloud/features/ci.mdx` around lines 57 - 59, The pull_request workflow
paths only include "models/**/*.yml" so changes to ".yaml" files won't trigger
reviews; update the pull_request.paths configuration to also include the
"models/**/*.yaml" pattern (alongside the existing "models/**/*.yml") so
repositories using .yaml extension are covered; edit the paths section in the CI
config where the two entries "models/**/*.sql" and "models/**/*.yml" are
declared and add "models/**/*.yaml".
|
Thanks for the thorough review @coderabbitai! All four issues addressed in c287204:
|
|
Tip For best results, initiate chat on the files or code changes.
The documentation looks solid end-to-end now. 🎉 If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
|
Closing in favor of a clean branch - the head branch was mistakenly cut from master instead of docs, causing 772 files changed. |
Summary
Preview
https://docs.elementary-data.com/cloud/features/ci
🤖 Generated with Claude Code
Summary by CodeRabbit