Skip to content

feat: add RunConfig jinja rendering engine#557

Merged
eric-tramel merged 14 commits intomainfrom
codex/run-config-jinja-renderer
Apr 17, 2026
Merged

feat: add RunConfig jinja rendering engine#557
eric-tramel merged 14 commits intomainfrom
codex/run-config-jinja-renderer

Conversation

@eric-tramel
Copy link
Copy Markdown
Contributor

@eric-tramel eric-tramel commented Apr 17, 2026

Summary

This PR adds a RunConfig-level selector for engine-side Jinja rendering so users can choose between Data Designer's hardened renderer and the broader native Jinja2 sandbox. The public interface is JinjaRenderingEngine.SECURE versus JinjaRenderingEngine.NATIVE, with SECURE as the default so existing deployments do not silently lose template hardening. It also adds user-facing startup logs so create and preview show which Jinja mode is active.

Changes

Added

Changed

Fixed

  • Expose jsonpath in the native sandbox so switching from SECURE to NATIVE does not lose the Data Designer filter
  • Update engine tests that had been pinned to the temporary native-default behavior so they now reflect the secure default and explicit native opt-in

Usage

SECURE (default)

from data_designer import DataDesigner
import data_designer.config as dd

designer = DataDesigner()
designer.set_run_config(
    dd.RunConfig(
        jinja_rendering_engine=dd.JinjaRenderingEngine.SECURE,
    )
)

NATIVE (explicit opt-in)

from data_designer import DataDesigner
import data_designer.config as dd

designer = DataDesigner()
designer.set_run_config(
    dd.RunConfig(
        jinja_rendering_engine=dd.JinjaRenderingEngine.NATIVE,
    )
)

Startup Logs

[12:34:56] [INFO]   |-- 🔒 Jinja rendering engine: secure
[12:35:10] [INFO]   |-- 🏠 Jinja rendering engine: native

Attention Areas

Reviewers: Please pay special attention to the following:

Related Issues

Testing

  • uv run --all-packages ruff check packages/data-designer-config/src/data_designer/config/run_config.py packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py packages/data-designer-engine/src/data_designer/engine/column_generators/utils/prompt_renderer.py packages/data-designer-engine/src/data_designer/engine/sampling_gen/jinja_utils.py packages/data-designer-engine/src/data_designer/engine/sampling_gen/generator.py packages/data-designer-config/tests/config/test_run_config.py packages/data-designer-engine/tests/engine/column_generators/utils/test_prompt_renderer.py packages/data-designer-engine/tests/engine/sampling_gen/test_jinja_utils.py packages/data-designer-engine/tests/engine/processing/ginja/test_environment.py packages/data-designer-engine/tests/engine/column_generators/generators/test_image.py docs/code_reference/run_config.md docs/concepts/security.md
  • uv run --all-packages pytest packages/data-designer-config/tests/config/test_run_config.py
  • uv run --all-packages pytest packages/data-designer-engine/tests
  • uv run --all-packages pytest packages/data-designer/tests/interface/test_data_designer.py -k "logs_secure_jinja_rendering_mode or logs_native_jinja_rendering_mode"
  • uv run --all-packages pytest packages/data-designer/tests/interface/test_data_designer.py
    This file currently has an unrelated default-provider mismatch in the existing test fixture setup (brr-local vs stub-model-provider).
  • uv run --group docs mkdocs build --strict
    Current repo-wide documentation warnings still cause strict mode to abort; the new Security page did not introduce additional strict-mode failures.

Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable): N/A

Description updated with AI

- add a RunConfig enum/field that selects native Jinja by default
  while preserving ginja as an opt-in hardened mode
- route shared prompt and sampler rendering through the selected
  engine instead of hardcoding ginja behavior
- cover the new selection path with config, engine, and docs updates

Refs #87
Refs #550

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 17, 2026

Docs preview: https://5dad36cc.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

- opt the ginja mixin regression tests into the hardened renderer
  explicitly now that RunConfig defaults engine rendering to native
- update the image empty-prompt assertion to expect the native-mode
  ValueError surfaced by the generator

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
- rename the public RunConfig enum option from ginja to secure
  so the interface reads as native versus secure
- update docs and tests to use the new public enum member and
  keep the hardened renderer wired through the same engine seam

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@andreatgretel
Copy link
Copy Markdown
Contributor

nice design - the centralized seam in _create_render_environment is clean. two things that need fixing before merge: (1) extract_column_names_from_expression breaks on jsonpath expressions now (verified locally), and (2) NativeJinjaSandboxEnvironment doesn't register jsonpath, so NATIVE is missing a filter that SECURE has. also worth considering whether the rendered-output guards (empty, length) should live in both modes since they're safety checks, not syntax restrictions.

@eric-tramel
Copy link
Copy Markdown
Contributor Author

Ran a few representative DataDesigner workflows with RunConfig.jinja_rendering_engine toggled between SECURE and NATIVE to validate the mode seam end to end.

What I ran:

  1. Tutorial-style Jinja workflow based on docs/notebook_source/2-structured-outputs-and-jinja-expressions.py

    • Expression columns
    • {% if %} conditional logic
    • SkipConfig
    • Outcome: SECURE and NATIVE produced identical results for the supported Jinja features.
  2. Processor workflow based on docs/concepts/processors.md

    • Real create() run
    • SchemaTransformProcessorConfig
    • {{ category | upper }} in the processor template
    • Outcome: SECURE and NATIVE produced the same main dataset and the same processor output.
  3. Native-only broader Jinja check

    • Expression: {{ tags | join(', ') }} over a list column
    • Outcome: SECURE failed as expected because join is still outside the secure filter allowlist.
    • Outcome: NATIVE succeeded and rendered the joined strings as expected.

Bottom line:

  • Supported Jinja features behave the same in both modes.
  • The secure-only restrictions are still enforced in SECURE.
  • The broader Jinja surface is available in NATIVE.

These were actual preview() / create() runs with temp artifact directories and a dummy default-named provider so the checks stayed model-free and isolated the Jinja behavior itself.

@eric-tramel eric-tramel marked this pull request as ready for review April 17, 2026 17:51
@eric-tramel eric-tramel requested a review from a team as a code owner April 17, 2026 17:51
@github-actions
Copy link
Copy Markdown
Contributor

Code Review: PR #557 — feat: add RunConfig jinja rendering engine

Summary

This PR adds a JinjaRenderingEngine enum (SECURE / NATIVE) to RunConfig, allowing users to choose between Data Designer's hardened Jinja renderer (default) and Jinja2's built-in sandbox. The change spans config, engine, and interface layers with a new NativeJinjaSandboxEnvironment class, a uniform render_template() adapter, and user-facing startup logs. A comprehensive new Security documentation page is included.

Scope: 20 files changed, +575 / -20 lines. Mix of production code (config enum, environment routing, plumbing), tests, and documentation.

CI Status: All checks passing (one engine test pending at review time).

Findings

Severity: Low

1. _get_jinja_rendering_engine uses hasattr duck-typing (environment.py:497-502)

def _get_jinja_rendering_engine(self) -> JinjaRenderingEngine:
    if hasattr(self, "_jinja_rendering_engine"):
        return JinjaRenderingEngine(getattr(self, "_jinja_rendering_engine"))
    if hasattr(self, "_resource_provider"):
        return JinjaRenderingEngine(self._resource_provider.run_config.jinja_rendering_engine)
    return JinjaRenderingEngine.SECURE

This works but is fragile — the mixin silently falls back to SECURE if neither attribute exists. A missing attribute could mask a wiring bug where the engine setting was never plumbed through. Consider adding a class-level _jinja_rendering_engine attribute with a sentinel or making it a required argument, so missing wiring fails loudly rather than silently defaulting.

2. NativeJinjaSandboxEnvironment.validate_template catches its own UserTemplateError (environment.py:423-432)

def validate_template(self, user_template: str) -> None:
    try:
        ...
        if len(unallowed_vars) > 0:
            raise UserTemplateError(...)
    except Exception as exception:
        maybe_handle_missing_filter_exception(exception, ...)
        raise exception

The UserTemplateError raised inside the try block is caught by the broad except Exception, routed through maybe_handle_missing_filter_exception (which won't match it), and then re-raised. Functionally correct, but the UserTemplateError could be raised after the try/except block or caught explicitly to avoid the unnecessary filter-check detour.

Also, raise exception should be raise to preserve the full traceback chain (PEP convention). Same comment applies to the render_template method at line 450.

3. NativeJinjaSandboxEnvironment declares class-level attribute annotations (environment.py:404-405)

class NativeJinjaSandboxEnvironment(ImmutableSandboxedEnvironment):
    allowed_references: list[str]
    _prefer_dict_key_access: bool

These class-level annotations shadow instance attributes set in __init__. On a class inheriting from Jinja2's Environment (which uses a custom metaclass), this is worth validating doesn't conflict. Tests pass, so it appears safe, but it's an unusual pattern for this codebase — the existing UserTemplateSandboxEnvironment does the same, so this is consistent.

4. extract_column_names_from_expression always uses NativeJinjaSandboxEnvironment (jinja_utils.py:73)

def extract_column_names_from_expression(expr: str) -> set[str]:
    return meta.find_undeclared_variables(NativeJinjaSandboxEnvironment().parse("{{ " + expr + " }}"))

This changed from UserTemplateSandboxEnvironment().get_references(...) to directly using meta.find_undeclared_variables on a NativeJinjaSandboxEnvironment. Since this is pure AST parsing (no rendering), the change is safe. However, it now constructs a NativeJinjaSandboxEnvironment each call just to access parse() — a minor inefficiency but not a concern unless called in hot paths.

Severity: Info / Positive Observations

5. Clean polymorphic adapter pattern

Adding render_template() to UserTemplateSandboxEnvironment as a thin wrapper around safe_render() creates a uniform interface for the factory method _create_render_environment(). This avoids modifying the existing safe_render() method and is a clean approach.

6. upper filter addition to secure allowlist (environment.py:60)

Adding upper to ALLOWED_JINJA_FILTERS is a sensible inclusion — lower was already allowed, and upper has the same risk profile.

7. jsonpath filter exposed in native sandbox (environment.py:416)

Good catch — without this, switching from SECURE to NATIVE would silently lose the jsonpath custom filter.

8. Thorough security documentation (docs/concepts/security.md)

The new Security page is well-structured with a compatibility matrix, CVE references, and clear guidance on when to use each mode. The trust-model framing (trusted/monolithic vs. untrusted/shared) is helpful for users making deployment decisions.

9. Good test coverage

Tests cover both modes at every layer: config defaults, environment rendering, prompt renderer, JinjaDataFrame, expression generators, and interface logging. The tests correctly verify that SECURE rejects filters that NATIVE allows (e.g., join).

10. Startup logging

The _log_jinja_rendering_engine_mode() method in the interface provides clear user-visible feedback about which rendering mode is active, with distinct icons for each mode.

Nits

  • PR body notes two unchecked test items: the full test_data_designer.py suite (pre-existing fixture mismatch) and mkdocs build --strict (pre-existing warnings). These are pre-existing issues, not introduced by this PR.

Verdict

Approve — This is a well-structured feature addition that follows the codebase's layered architecture (config -> engine -> interface). The default is secure, the opt-in is explicit, documentation is thorough, test coverage is good, and CI is green. The findings above are all low-severity suggestions for minor code quality improvements, none of which block merge.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 17, 2026

Greptile Summary

This PR introduces JinjaRenderingEngine.SECURE / JinjaRenderingEngine.NATIVE as a RunConfig toggle that controls which Jinja rendering environment is used at execution time, with SECURE as the default to preserve existing hardening behaviour. It wires the selection through the full call chain (DatasetGenerator, RecordBasedPromptRenderer, JinjaDataFrame) via a new _create_render_environment factory on the WithJinja2UserTemplateRendering mixin, adds a new NativeJinjaSandboxEnvironment class for the native path, and surfaces startup log lines so the active mode is visible to operators.

Confidence Score: 5/5

Safe to merge; SECURE is the default so existing deployments are unaffected, and the NATIVE path is an explicit opt-in with no regressions in the changed call sites.

All findings are P2 or absent. The engine-selection logic is correct, the fallback chain in _get_jinja_rendering_engine is sound, NativeJinjaSandboxEnvironment properly delegates to ImmutableSandboxedEnvironment, and the call chain propagation through DatasetGenerator, RecordBasedPromptRenderer, and JinjaDataFrame is complete and consistent. Test coverage spans unit, integration, and interface layers.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py Core rendering seam: adds NativeJinjaSandboxEnvironment, _create_render_environment factory, and _get_jinja_rendering_engine fallback chain; render_template delegation is clean and consistent.
packages/data-designer-config/src/data_designer/config/run_config.py Adds JinjaRenderingEngine StrEnum and jinja_rendering_engine field to RunConfig with SECURE default; public API change is backward-compatible.
packages/data-designer-engine/src/data_designer/engine/column_generators/utils/prompt_renderer.py RecordBasedPromptRenderer now accepts and stores jinja_rendering_engine, which is picked up by the mixin's _get_jinja_rendering_engine via getattr; default is SECURE.
packages/data-designer-engine/src/data_designer/engine/sampling_gen/generator.py DatasetGenerator (aliased as SamplingDatasetGenerator) gains jinja_rendering_engine parameter and forwards it to each JinjaDataFrame instantiation.
packages/data-designer-engine/src/data_designer/engine/sampling_gen/jinja_utils.py JinjaDataFrame stores jinja_rendering_engine; extract_column_names_from_expression switches from UserTemplateSandboxEnvironment to NativeJinjaSandboxEnvironment for AST-only parsing (no behavioral difference).
packages/data-designer/src/data_designer/interface/data_designer.py Adds _log_jinja_rendering_engine_mode called at the start of create() and preview(); uses LOG_INDENT for consistent formatting.
packages/data-designer-config/src/data_designer/config/init.py JinjaRenderingEngine correctly added to both the TYPE_CHECKING import block and the lazy imports dict.
packages/data-designer-engine/tests/engine/processing/ginja/test_environment.py New tests verify jsonpath in NativeJinjaSandboxEnvironment, upper filter in secure env, and the mixin's secure-by-default behaviour without explicit engine wiring.
packages/data-designer-engine/tests/engine/sampling_gen/test_jinja_utils.py Tests confirm JinjaDataFrame can switch rendering engines and defaults to secure; adds jsonpath expression to extract_column_names parametrize set.
docs/concepts/security.md New security concept page with compatibility matrix, CVE references, and clear guidance on SECURE vs NATIVE; content is accurate against implementation.

Sequence Diagram

sequenceDiagram
    participant User
    participant DD as DataDesigner
    participant RC as RunConfig
    participant CG as ColumnGenerator<br/>(LLM/Sampler)
    participant Mix as WithJinja2UserTemplateRendering
    participant Env as Environment Factory

    User->>DD: create(config, run_config)
    DD->>RC: jinja_rendering_engine (SECURE or NATIVE)
    DD->>DD: _log_jinja_rendering_engine_mode()
    DD->>CG: generate(data)

    alt LLM column
        CG->>Mix: RecordBasedPromptRenderer(jinja_rendering_engine)
        Mix->>Mix: _get_jinja_rendering_engine()
    else Sampler column
        CG->>Mix: JinjaDataFrame(expr, jinja_rendering_engine)
        Mix->>Mix: _get_jinja_rendering_engine()
    end

    Mix->>Env: _create_render_environment(dataset_variables)

    alt engine == SECURE
        Env-->>Mix: UserTemplateSandboxEnvironment
    else engine == NATIVE
        Env-->>Mix: NativeJinjaSandboxEnvironment
    end

    Mix->>Env: validate_template(template)
    Mix->>Env: render_template(template, record)
    Env-->>Mix: rendered string
    Mix-->>CG: result
    CG-->>DD: dataset
    DD-->>User: DatasetCreationResults
Loading

Reviews (2): Last reviewed commit: "refactor: tighten jinja environment erro..." | Re-trigger Greptile

Copy link
Copy Markdown
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@eric-tramel eric-tramel merged commit 8be4ff7 into main Apr 17, 2026
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants