Skip to content

Ele 4942 dimension anomalies visualization#847

Merged
NoyaArie merged 9 commits intomasterfrom
ele-4942-dimension-anomalies-visualization
Aug 27, 2025
Merged

Ele 4942 dimension anomalies visualization#847
NoyaArie merged 9 commits intomasterfrom
ele-4942-dimension-anomalies-visualization

Conversation

@NoyaArie
Copy link
Contributor

@NoyaArie NoyaArie commented Aug 21, 2025

Summary by CodeRabbit

  • New Features

    • Added a dimension-specific anomaly detection path for more reliable group-level anomaly identification.
  • Bug Fixes

    • Anomaly evaluation now runs against the correctly prepared test context and stores the appropriate results, improving accuracy.
  • Refactor

    • Major rewrite of the anomaly scoring/query logic to standardize outputs and filtering.
  • Tests

    • Expanded test data and strengthened assertions to validate per-dimension anomaly detection (e.g., "Superman").

@linear
Copy link

linear bot commented Aug 21, 2025

@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Failure
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Inactive
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 21, 2025 15:01 — with GitHub Actions Failure
@coderabbitai
Copy link

coderabbitai bot commented Aug 21, 2025

Walkthrough

Replaces the generic anomaly query with a dimension-specific macro using a computed flattened_test; rewrites the read-scores macro to compute anomalies (backfill, min/max, is_anomalous logic); updates integration test assertions to validate dimension-based anomalies for "Superman"; minor formatting-only edits elsewhere.

Changes

Cohort / File(s) Summary
Test invocation update
macros/edr/tests/test_dimension_anomalies.sql
Replaces elementary.get_anomaly_query(flatten_model) with elementary.get_anomaly_query_for_dimension_anomalies(flattened_test) where flattened_test is produced via elementary.flatten_test(context["model"]); updates SQL variable names and downstream store_anomaly_test_results call.
Anomaly query macros
macros/edr/tests/test_utils/get_anomaly_query.sql
Adds get_anomaly_query_for_dimension_anomalies(flattened_test=none) and majorly rewrites get_read_anomaly_scores_query(flattened_test=none): derives test_unique_id, loads test_configuration from cache, computes backfill_period, constructs CTEs (anomaly_scores → anomaly_scores_with_is_anomalous → final_results), computes min/max via lag/prior metrics per direction/spike/drop, applies exclude_final_results, and uses anomaly_score_condition/fail_on_zero. Minor formatting tweaks included.
Integration test assertions
integration_tests/tests/test_dimension_anomalies.py
Extends training data, groups anomaly test points by dimension_value, filters for "Superman", and asserts that only that dimension is reported, totals match, and at least one Superman point is anomalous.
Formatting-only changes
macros/edr/materializations/test/test.sql
Whitespace and comment-wrapping adjustments only; no functional changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant IT as integration_test
  participant Caller as test_macro_invocation
  participant DimMacro as get_anomaly_query_for_dimension_anomalies
  participant ReadScores as get_read_anomaly_scores_query
  participant DB as Warehouse

  IT->>Caller: invoke test that flattens model
  Caller->>DimMacro: pass flattened_test
  activate DimMacro
  DimMacro->>ReadScores: build/read scores query (flattened_test)
  ReadScores->>DB: execute read_scores_query
  DB-->>ReadScores: metric rows
  ReadScores-->>DimMacro: rows annotated with is_anomalous
  DimMacro->>DimMacro: select distinct anomalous dimension_value(s)
  DimMacro-->>Caller: return anomalous_dimension_rows_sql / result set
  Caller-->>IT: store/analyze anomaly results
  deactivate DimMacro
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Suggested reviewers

  • MikaKerman

Poem

I nibble on queries beneath moonlight,
Flattened tests hop in, tidy and bright.
Dimension whispers point to one name — Superman,
I twitch my nose, the anomalies prance and ran.
Thump — tests pass, and the rabbit grins wide.

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ele-4942-dimension-anomalies-visualization

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Contributor

👋 @NoyaArie
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
macros/edr/tests/test_utils/get_anomaly_query.sql (3)

12-15: Variable name is misleading (non_anomalies_query actually returns anomalies only).

non_anomalies_query appends where is_anomalous = true, so it’s an anomalies-only query. Consider renaming for clarity.

Apply this diff for readability:

-  {% set non_anomalies_query -%}
+  {% set anomalies_only_query -%}
     {{ anomaly_scores_query }}
     where is_anomalous = true
   {%- endset -%}
@@
-  {% if anomalies_count > 0 %}
-    {{- return(anomaly_scores_query) -}}
-  {% else %}
-    {{- return(non_anomalies_query) -}}
-  {% endif %}
+  {% if anomalies_count > 0 %}
+    {{- return(anomaly_scores_query) -}}
+  {% else %}
+    {{- return(anomalies_only_query) -}}
+  {% endif %}

17-24: Cast the count to int to avoid type/coercion surprises in Jinja.

agate_to_dicts may return Decimal/str depending on adapter. An explicit cast keeps the comparison robust.

Apply this diff:

-  {% set anomalies_count_result = elementary.agate_to_dicts(elementary.run_query(anomalies_count_query)) %}
-  {% set anomalies_count = anomalies_count_result[0]['anomalies_count'] %}
+  {% set anomalies_count_result = elementary.agate_to_dicts(elementary.run_query(anomalies_count_query)) %}
+  {% set anomalies_count = (anomalies_count_result[0]['anomalies_count'] | int) %}

91-101: No-op formatting changes look fine; keep an eye on duplicate columns from “*, value/average/min/max”.

Selecting aliased columns and then “” intentionally duplicates columns. If any downstream expects unique column names, consider ordering “” first or omitting superseded originals. No change required if consumers are tolerant.

I can refactor final_results to explicitly project the final column set if you prefer stricter schemas.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 98750df and 9374977.

📒 Files selected for processing (2)
  • macros/edr/tests/test_dimension_anomalies.sql (1 hunks)
  • macros/edr/tests/test_utils/get_anomaly_query.sql (6 hunks)
🔇 Additional comments (3)
macros/edr/tests/test_dimension_anomalies.sql (1)

74-74: No legacy get_anomaly_query calls remain
I’ve confirmed every get_anomaly_query invocation now uses flattened_test; none pass flatten_model any longer.
• macros/edr/tests/test_dimension_anomalies.sql
• macros/edr/tests/test_all_columns_anomalies.sql
• macros/edr/tests/test_table_anomalies.sql
• macros/edr/tests/test_column_anomalies.sql

Approving these changes.

macros/edr/tests/test_utils/get_anomaly_query.sql (2)

2-5: Wrapper select for downstream filtering looks fine.

Building anomaly_scores_query as a selectable wrapper enables clean outer WHEREs without altering the inner query. No issues here.


20-24: Behavioral check: returning full series when anomalies exist is intentional?

If anomalies_count > 0 you return the unfiltered series (anomalous + non-anomalous rows). That seems aligned with “dimension anomalies visualization” (to render context around anomalies). Please confirm downstream consumers (e.g., store_anomaly_test_results, UI) expect and efficiently handle non-anomalous rows; otherwise, result volume may spike.

If needed, I can help gate the full-series return behind a configurable flag (e.g., return_full_series_on_anomaly=true) to keep backward compatibility.

@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Failure
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Inactive
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Failure
@NoyaArie NoyaArie had a problem deploying to elementary_test_env August 24, 2025 07:46 — with GitHub Actions Failure
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
integration_tests/tests/test_dimension_anomalies.py (3)

104-109: Prefer direct set equality and a set comprehension

This is cleaner and asserts the exact expected set in one check.

-    # Only dimension values that are anomalous are stored in the test points
-    dimension_values = set([x["dimension_value"] for x in anomaly_test_points])
-    assert len(dimension_values) == 2
-    assert "Superman" in dimension_values
-    assert "Spiderman" in dimension_values
+    # Only dimension values that are anomalous are stored in the test points
+    dimension_values = {x["dimension_value"] for x in anomaly_test_points}
+    assert dimension_values == {"Superman", "Spiderman"}

113-120: Avoid magic number 13; derive expected points per dimension

Hard-coding 13 is brittle. Derive the expected count from the same date generator to make the test resilient to window changes.

-    assert len(superman_anomaly_test_points) == 13
+    expected_points_per_group = len(list(generate_dates(base_date=utc_today - timedelta(1))))
+    assert len(superman_anomaly_test_points) == expected_points_per_group
     assert any(x["is_anomalous"] for x in superman_anomaly_test_points)
@@
-    assert len(spiderman_anomaly_test_points) == 13
+    assert len(spiderman_anomaly_test_points) == expected_points_per_group
     assert any(x["is_anomalous"] for x in spiderman_anomaly_test_points)

Verification tip: If the anomaly points are intended to exclude the test date, replace the derivation with either:

  • expected_points_per_group = len(list(generate_dates(base_date=utc_today - timedelta(1)))) - 1
  • or expected_points_per_group = len(training_dates)

Please confirm which window the query returns in your adapters.


101-103: Update the stale comment to match the new behavior

The comment says “we should only get 1 row with the problematic value,” but the assertions below expect full time-series points per dimension when anomalies exist. Please update the comment to avoid confusion.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9374977 and e94a6c5.

📒 Files selected for processing (1)
  • integration_tests/tests/test_dimension_anomalies.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (1.8.0, postgres) / test

@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 25, 2025 09:47 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 25, 2025 09:47 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 25, 2025 09:47 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 25, 2025 09:47 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 25, 2025 09:47 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 25, 2025 09:47 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 25, 2025 09:47 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 25, 2025 09:47 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 25, 2025 09:47 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 25, 2025 09:47 — with GitHub Actions Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
macros/edr/tests/test_utils/get_anomaly_query.sql (1)

9-21: Fix cross‑dimension leakage and avoid double‑scanning the base anomaly query.

  • Current IN subquery keys only on dimension_value. Identical values across different dimensions will bleed into results.
  • elementary.get_read_anomaly_scores_query is expanded twice; this is expensive and can hinder optimizers.
  • Add null guards for dimension/dimension_value and match on the pair using EXISTS, wrapping the heavy query once in a CTE.

Apply this diff:

-{%- macro get_anomaly_query_for_dimension_anomalies(flattened_test=none) -%}
-  {%- set dimension_values_query -%}
-    select distinct dimension_value from ({{ elementary.get_read_anomaly_scores_query(flattened_test) }}) results
-    where is_anomalous = true
-  {%- endset -%}
-
-  {% set dimension_anomalies_query -%}
-    select * from ({{ elementary.get_read_anomaly_scores_query(flattened_test) }}) results
-    where dimension_value in ({{ dimension_values_query }})
-  {%- endset -%}
-
-  {{- return(dimension_anomalies_query) -}}
-{%- endmacro -%}
+{%- macro get_anomaly_query_for_dimension_anomalies(flattened_test=none) -%}
+  {%- set dimension_anomalies_query -%}
+    with base as (
+      {{ elementary.get_read_anomaly_scores_query(flattened_test) }}
+    ),
+    anomalous_pairs as (
+      select distinct
+        dimension,
+        dimension_value
+      from base as results
+      where results.dimension is not null
+        and results.dimension_value is not null
+        and results.is_anomalous = true
+    )
+    select *
+    from base as results
+    where exists (
+      select 1
+      from anomalous_pairs p
+      where p.dimension = results.dimension
+        and p.dimension_value = results.dimension_value
+    )
+  {%- endset -%}
+
+  {{- return(dimension_anomalies_query) -}}
+{%- endmacro -%}

Note: qualifying columns through the alias (results.is_anomalous, etc.) improves portability on stricter adapters.

🧹 Nitpick comments (2)
macros/edr/tests/test_utils/get_anomaly_query.sql (2)

82-97: Partition LAG windows by execution to prevent cross‑run contamination.

The LAG() windows for min_value and max_value currently partition by table/column/metric/dimension/bucket_seasonality. If the anomaly_scores table holds multiple runs, the “previous” value can jump across test executions. Include test_execution_id (or test_unique_id) in the partition key to ensure expectations are computed within a single run.

Apply this diff:

-{{ elementary.lag('metric_value') }} over (partition by full_table_name, column_name, metric_name, dimension, dimension_value, bucket_seasonality order by bucket_end)
+{{ elementary.lag('metric_value') }} over (partition by test_execution_id, full_table_name, column_name, metric_name, dimension, dimension_value, bucket_seasonality order by bucket_end)
-{{ elementary.lag('min_metric_value') }} over (partition by full_table_name, column_name, metric_name, dimension, dimension_value, bucket_seasonality order by bucket_end)
+{{ elementary.lag('min_metric_value') }} over (partition by test_execution_id, full_table_name, column_name, metric_name, dimension, dimension_value, bucket_seasonality order by bucket_end)
-{{ elementary.lag('metric_value') }} over (partition by full_table_name, column_name, metric_name, dimension, dimension_value, bucket_seasonality order by bucket_end)
+{{ elementary.lag('metric_value') }} over (partition by test_execution_id, full_table_name, column_name, metric_name, dimension, dimension_value, bucket_seasonality order by bucket_end)
-{{ elementary.lag('max_metric_value') }} over (partition by full_table_name, column_name, metric_name, dimension, dimension_value, bucket_seasonality order by bucket_end)
+{{ elementary.lag('max_metric_value') }} over (partition by test_execution_id, full_table_name, column_name, metric_name, dimension, dimension_value, bucket_seasonality order by bucket_end)

If the intent is to stitch across executions, prefer test_unique_id so you don’t mix different tests that coincidentally share identifiers.


159-168: Zero check semantics — optional robustness note.

Current (metric_value = 0 and …) is fine for integer/decimal metrics. If some adapters round floats unpredictably, consider an optional tolerance-based helper in the future (e.g., abs(metric_value) < epsilon). Not a blocker.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d6cc6a3 and 584601c.

📒 Files selected for processing (2)
  • macros/edr/materializations/test/test.sql (4 hunks)
  • macros/edr/tests/test_utils/get_anomaly_query.sql (5 hunks)
✅ Files skipped from review due to trivial changes (1)
  • macros/edr/materializations/test/test.sql
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-10T11:29:44.525Z
Learnt from: GuyEshdat
PR: elementary-data/dbt-data-reliability#838
File: macros/utils/sql_utils/escape_reserved_keywords.sql:29-31
Timestamp: 2025-08-10T11:29:44.525Z
Learning: In the Elementary dbt-data-reliability codebase, the `escape_keywords` macro in `macros/utils/sql_utils/escape_reserved_keywords.sql` is specifically designed to escape SQL reserved keywords for different database adapters. Reserved keywords are predefined language tokens (like 'filter', 'sql', 'timestamp', 'value', 'one', 'min', 'max', 'sum', 'count' for Dremio) that don't contain special characters like double quotes. Therefore, the `dremio__escape_keywords` macro correctly wraps these keywords in double quotes without needing to handle embedded quotes, as reserved keywords by definition are simple identifiers.

Applied to files:

  • macros/edr/tests/test_utils/get_anomaly_query.sql
📚 Learning: 2025-08-10T11:29:06.982Z
Learnt from: GuyEshdat
PR: elementary-data/dbt-data-reliability#838
File: macros/utils/table_operations/delete_and_insert.sql:138-146
Timestamp: 2025-08-10T11:29:06.982Z
Learning: In the Elementary dbt-data-reliability codebase, the team prefers to add the `escape_reserved_keywords` macro only in places where they've actually encountered errors with reserved keywords, rather than preemptively adding it everywhere. They rely on E2E tests to catch future issues if reserved keywords cause problems in untested code paths. This pragmatic approach avoids unnecessary defensive programming and keeps the codebase simpler.

Applied to files:

  • macros/edr/tests/test_utils/get_anomaly_query.sql
📚 Learning: 2025-08-10T11:28:43.632Z
Learnt from: GuyEshdat
PR: elementary-data/dbt-data-reliability#838
File: macros/utils/table_operations/insert_rows.sql:156-0
Timestamp: 2025-08-10T11:28:43.632Z
Learning: In the Elementary dbt-data-reliability codebase, the `dremio__escape_special_chars` macro in `macros/utils/table_operations/insert_rows.sql` receives strings where single quotes are already escaped as `\'` from upstream processing. The macro correctly converts these to Dremio's SQL escaping format by replacing `\'` with `''`. This is different from other database implementations that handle bare single quotes directly.

Applied to files:

  • macros/edr/tests/test_utils/get_anomaly_query.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (latest_official, redshift) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (1.8.0, postgres) / test
🔇 Additional comments (3)
macros/edr/tests/test_utils/get_anomaly_query.sql (3)

77-81: Good call using the reserved‑keyword escape for value.

Alias uses {{ elementary.escape_reserved_keywords('value') }}, which aligns with the team’s practice of applying escaping only where we’ve hit issues, and covers adapters where VALUE is special.


138-156: Whitespace-only change — no functional impact.

No action needed.


170-184: Anomaly score condition composition looks correct.

The fail_on_zero short-circuit plus directional/percent conditions is coherent, and the backfill gating is applied outside via is_anomalous. LGTM.


{% set flattened_test = elementary.flatten_test(context["model"]) %}
{% set anomalous_rows_sql = elementary.get_anomaly_query(flatten_model) %}
{% set anomalous_rows_sql = elementary.get_anomaly_query_for_dimension_anomalies(flattened_test) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate to two different parameters, one for storing results and one for the test sql itself

@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
@NoyaArie NoyaArie temporarily deployed to elementary_test_env August 26, 2025 12:52 — with GitHub Actions Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
macros/edr/tests/test_dimension_anomalies.sql (1)

78-78: Inline SQL output is inconsistent with stored query

You're storing results from the dimension-specific query but printing the generic get_anomaly_query, which can confuse debugging and test artifacts. Print the same dimension-specific SQL you store.

-{{ elementary.get_anomaly_query(flattened_test) }}
+{{ anomalous_dimension_rows_sql }}

Alternative (if you prefer not to rely on the variable):

-{{ elementary.get_anomaly_query(flattened_test) }}
+{{ elementary.get_anomaly_query_for_dimension_anomalies(flattened_test=flattened_test) }}
🧹 Nitpick comments (1)
macros/edr/tests/test_dimension_anomalies.sql (1)

74-74: Pass flattened_test as a named argument for macro stability

Future-proof the call against signature changes and improve readability.

-{% set anomalous_dimension_rows_sql = elementary.get_anomaly_query_for_dimension_anomalies(flattened_test) %}
+{% set anomalous_dimension_rows_sql = elementary.get_anomaly_query_for_dimension_anomalies(flattened_test=flattened_test) %}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 584601c and dc22cb3.

📒 Files selected for processing (1)
  • macros/edr/tests/test_dimension_anomalies.sql (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test (latest_official, trino) / test
  • GitHub Check: test (latest_official, postgres) / test
  • GitHub Check: test (1.8.0, postgres) / test
  • GitHub Check: test (latest_pre, postgres) / test
  • GitHub Check: test (latest_official, clickhouse) / test
  • GitHub Check: test (latest_official, snowflake) / test
  • GitHub Check: test (latest_official, dremio) / test
  • GitHub Check: test (latest_official, bigquery) / test
  • GitHub Check: test (latest_official, databricks_catalog) / test
  • GitHub Check: test (latest_official, athena) / test
  • GitHub Check: test (latest_official, redshift) / test
🔇 Additional comments (2)
macros/edr/tests/test_dimension_anomalies.sql (2)

76-76: Good: storing results uses the dimension-specific query

Using anomalous_dimension_rows_sql with store_anomaly_test_results aligns the storage path with the new dimension anomalies flow.


43-43: No-op formatting change

Whitespace-only; no action needed.

@NoyaArie NoyaArie merged commit b46d220 into master Aug 27, 2025
15 checks passed
@NoyaArie NoyaArie deleted the ele-4942-dimension-anomalies-visualization branch August 27, 2025 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants