-
Notifications
You must be signed in to change notification settings - Fork 22
Open
Description
Description
I can't get pb.col() to work within the YAML workflow. It appears, for col_vals_ge() & col_vals_gt() at least, pb.col() inputs fail in _string_date_dttm_conversion(value=value).
Reproducible example
Create some data for example:
import pointblank as pb
import polars as pl
from datetime import date
df = pl.DataFrame(
{
"id": list(range(123, 123 + 6)), # 123, 124, 125, 126, 127, 128
"start_date": [
date(2024, 5, 1),
date(2024, 6, 1),
date(2024, 5, 15),
date(2024, 7, 1),
date(2024, 6, 10),
date(2024, 5, 20),
],
"end_date": [
date(2024, 5, 10),
date(2024, 6, 5),
date(2024, 5, 25),
date(2024, 7, 10),
date(2024, 6, 5),
date(2024, 5, 30),
],
}
)This validation works, 1 row fails because the end date is later than the start date:
validation = (
pb.Validate(data=df)
.col_vals_ge(columns="end_date", value=pb.col("start_date"))
.interrogate()
)
validationThis validation fails, I get the error ValueError: If value= is provided as a string it must be a date or datetime string:
# Set to small_table because None fails due to validator that runs before set_tbl, allowing None would be helpful but totally separate issue :)
pb_yaml = '''
tbl: small_table
steps:
- col_vals_ge:
columns: [end_date]
value: pb.col("start_date")
'''
result = pb.yaml_interrogate(pb_yaml, set_tbl = df)Expected result
I want the YAML validation to provide the same output as the python code version above.
Development environment
macOS 14.5
pointblank 0.14.0