Skip to content

YAML value column selector #333

@mark-druffel

Description

@mark-druffel

Description

I can't get pb.col() to work within the YAML workflow. It appears, for col_vals_ge() & col_vals_gt() at least, pb.col() inputs fail in _string_date_dttm_conversion(value=value).

Reproducible example

Create some data for example:

import pointblank as pb
import polars as pl
from datetime import date

df = pl.DataFrame(
    {
        "id": list(range(123, 123 + 6)),  # 123, 124, 125, 126, 127, 128
        "start_date": [
            date(2024, 5, 1),
            date(2024, 6, 1),
            date(2024, 5, 15),
            date(2024, 7, 1),
            date(2024, 6, 10),
            date(2024, 5, 20),
        ],
        "end_date": [
            date(2024, 5, 10),
            date(2024, 6, 5),
            date(2024, 5, 25),
            date(2024, 7, 10),
            date(2024, 6, 5),
            date(2024, 5, 30),
        ],
    }
)

This validation works, 1 row fails because the end date is later than the start date:

validation = (
    pb.Validate(data=df)
    .col_vals_ge(columns="end_date", value=pb.col("start_date"))
    .interrogate()
)
validation

This validation fails, I get the error ValueError: If value= is provided as a string it must be a date or datetime string:

# Set to small_table because None fails due to validator that runs before set_tbl, allowing None would be helpful but totally separate issue :) 
pb_yaml = '''
tbl: small_table
steps:
- col_vals_ge:
    columns: [end_date]
    value: pb.col("start_date")
'''
result = pb.yaml_interrogate(pb_yaml, set_tbl = df)

Expected result

I want the YAML validation to provide the same output as the python code version above.

Development environment

macOS 14.5
pointblank 0.14.0

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions