Skip to content

[Data] Implement constant folding optimization for Ray Data expressions #60634

@slfan1989

Description

@slfan1989

Description

Problem

Ray Data currently performs no compile-time constant folding or basic algebraic simplification on expressions. As a result, constant sub-expressions are re-evaluated for every row at runtime, even when the result is trivially known during planning.

Examples of wasted computation:

  • lit(3) + lit(5) → recomputed per row instead of folded to lit(8) once
  • col("x") * lit(1) → unnecessary multiplication per row instead of simplified to col("x")
  • lit(False) & expensive_udf(col("data")) → executes the expensive UDF unnecessarily, even though result is always False

This leads to avoidable CPU overhead, especially in large-scale pipelines with many chained map, filter, or with_columns operations.

Proposed Solution

Add a ConstantFoldingRule to the logical optimizer that:

  • Folds pure constant expressions (lit(3) + lit(5)lit(8))
  • Applies algebraic identities (x * 1x, x + 0x, x * 00 (with null handling))
  • Performs boolean short-circuit & constant propagation:
    • False & <expr>False
    • True | <expr>True
    • ~TrueFalse
  • Handles nested expressions and repeated applications until fixpoint
  • Eliminates redundant operations (e.g. NOT(NOT(x))x)

Benefits

  • Performance: Eliminates redundant per-row computation → faster execution, especially for constant-heavy projections/filters
  • Plan simplification: Produces cleaner expression trees → enables better downstream optimizations (predicate pushdown, column pruning, fusion, etc.)
  • Zero runtime cost: All folding happens during logical planning
  • Backward compatible: Transparent to existing user code

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Important issue, but not time-criticalcommunity-backlogdataRay Data-related issuesenhancementRequest for new feature and/or capabilityperformance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions