Implement Battery Data Format (BDF) column system and BDX format by tomjholland · Pull Request #373 · ImperialCollegeLondon/PyProBE

tomjholland · 2026-03-29T15:21:05Z

Closes #281

Summary

New BDF enum and Column/BDFColumn/ColumnSet classes (column.py): standardises 27 canonical battery quantities (e.g. Voltage / V, Current / A). ColumnSet resolves Polars column names with automatic unit conversion via Pint, replacing all hardcoded column string references. Closes Update column names to align with bdf #283, Use [pint](https://pint.readthedocs.io/en/stable/) for reliable unit conversions #284
Metadata written out to parquet file or json sidecar. Closes Write required bdf metadata alongside .parquet files #282
BDF Enum provides Column objects that can be used throughout the code for resolving BDF standard columns using recipes, whether or not they are in the underlying dataframe
New io.py module: replaces the old cycler-processor architecture with process_cycler() (file → Parquet) and process_generic() (arbitrary DataFrame → BDF). Adds timezone correction, temperature column import, glob pattern support, and a column_map override mechanism.
Simplified Cell API (cell.py): removes process_cycler_file(), process_generic_file(), import_data(), and import_from_cycler() in favour of a single add_procedure() method. Removes Pydantic from Cell.
Result.add_data() overhaul: joins on Unix time, consolidates timezone handling into a single timezone argument, and uses an explicit column_map dict instead of positional column lists.
Result.info → Result.metadata: breaking rename for clarity.
Procedure.load() factory method: simplified initialisation path.
Analysis modules updated throughout to use BDF enum references instead of raw strings.

Breaking changes

Column naming format changed from Quantity [unit] to Quantity / unit: all column names previously using square-bracket notation (e.g. "Current [A]", "Voltage [V]") must be updated to slash-separated BDF format (e.g. "Current / A", "Voltage / V"). The old units.py module and its split_quantity_unit() helper are removed; use BDF enum members or Column / ColumnSet instead.
Cell.process_cycler_file(), process_generic_file(), import_data(), import_from_cycler() removed — use add_procedure().
Result.info renamed to Result.metadata. Closes Change metadata to be a property of procedures rather than cells #286
Result.add_data() signature changed: date_column_name → time_column_name, importing_columns → column_map.
io.process_cycler() now returns Path instead of LazyFrame; several parameters renamed/removed (output_dir → output_path, write_parquet/metadata/metadata_format/extra_columns removed).

Merging requires closing battery-data-alliance/battery-data-format#6

…onversions

…on capabilities - Updated regex patterns for various column name formats. - Added support for unit aliases and enhanced unit conversion logic. - Introduced new methods for finding and resolving column names. - Expanded test coverage for parsing, conversion, and resolution functionalities.

replace multi-format column parsing with bdf-standard descriptors. use recipes to derive columns like net capacity from dependencies and columnset for context-aware resolution.

…on classes

move resolution logic into column classes and replace global instances with a bdf enum. rename column_name to name and add factory functions for column creation.

integrate BDF-aware column resolution and unit conversion into the Result class hierarchy via ColumnSet. replace hardcoded column strings with BDF enum references across filters and rawdata modules. simplify metadata management and remove automatic column zeroing from filtered objects. introduce a Procedure.load method for simplified initialization.

**Changes to io.py:** - process_cycler() now returns Path instead of LazyFrame - Renamed parameter: output_dir → output_path - Removed parameters: write_parquet, metadata, metadata_format, extra_columns - Added compression_priority parameter for Parquet compression control - Added support for glob patterns in source parameter - Added column_map parameter to override/extend auto-resolved BDF columns **New functions:** - process_generic(): Normalise arbitrary DataFrames to BDF format - attach_metadata(): Update metadata on existing Parquet files **New helper functions:** - _resolve_glob(), _load_raw_dataframes(), _concat_dataframes() - _handle_existing_cached_file(), _build_column_map_exprs(), _extract_column_map_columns() **Tests:** - Complete rewrite of test_io.py with 50+ tests - New test classes: TestProcessCyclerOutputPath, TestProcessGeneric, TestHelperFunctions - Tests for glob patterns, column_map, compression priorities - Tests for attach_metadata and process_generic with DataFrame sources

@deprecated

…_procedure API - New add_procedure() method - Remove old process_cycler_file(), process_generic_file(), import_data(), import_from_cycler() methods (replaced by add_procedure()) - Add deprecated skeleton methods with @deprecated decorator that raise informative errors directing users to add_procedure() - Remove module-level process_cycler_data() function - Deprecate _cycler_dict module variable (cycler handling now internal to io module) - Clean up test_cell.py: remove 12 obsolete test functions for removed methods - Keep tests for add_procedure() (3 tests) and other public methods - Update add_procedure() signature to handle both files and DataFrames with unified column_map parameter and optional output_path - Simplify import handling by delegating to io.process_cycler() and io.process_generic() functions

…implified API Standardize Result.add_data() to use "Unix Time / s" as the join key across all new data imports. Simplify the method signature by replacing date_column_name with time_column_name, consolidating timezone parameters into a single timezone argument, and replacing importing_columns with a more explicit column_map dict. Update align_data() and related tests to match. Reorganize test_result.py tests into logical classes for clarity.

…f exact match

…d uses resolve()

Update Result and RawData methods to accept Column objects or BDF enum members in addition to strings. This enables type-safe column access across the API. ColumnSet now uses tuples for public properties and a set for internal storage to improve consistency. BREAKING CHANGE: the definition parameter has been removed from RawData.zero_column. ColumnSet.names and ColumnSet.quantities now return tuples instead of lists.

allow Column.resolve and Column.can_resolve to accept ColumnSet objects directly. update the ColumnSet string representation to list column names and identify BDF-standard columns.

…ng interface rename ColumnSet to ColumnDict and implement the Mapping interface for direct lookup by column name. replace the previous set-based iteration in resolution methods with indexed lookups by name and quantity to improve performance.

tomjholland added 26 commits March 17, 2026 13:08

chore: add pint dependency

e17f9f3

refactor: create new ColumnName class for handling column names and c…

475db0d

…onversions

fix: add temperature exception for the 'C' unit

20c30f4

refactor: create schema module

8c18a8a

refactor(column): implement bdf standard and recipe-based derivation

3a39ec9

replace multi-format column parsing with bdf-standard descriptors. use recipes to derive columns like net capacity from dependencies and columnset for context-aware resolution.

refactor(io): create io.py module to replace cycler processors

338aea7

refactor(result): revert Result and daughter classes to standard pyth…

081e1f2

…on classes

refactor!: rename Result.info to Result.metadata

225d71a

refactor: update metadata handling for files in io

63e1034

refactor(column): migrate to bdf enum and encapsulate resolution logic

2afabf5

move resolution logic into column classes and replace global instances with a bdf enum. rename column_name to name and add factory functions for column creation.

refactor(io): update io module to use new BDF Enum

2837ad7

refactor: create load() factory method for Procedure class

08eb11c

fix: require Unix Time or Test Time as a RawData columns

b5713ec

feat(io): add timezone correction and import temperature columns

4a40831

refactor: extract timezone validation to utils module

75eef0c

test: update pre-processed sample data

6330905

refactor(analysis): update all analysis methods for new column naming

c0b71a1

fix: give SOC a % unit

52f222e

fix(column): make static names attribute, and shortcuts for resolve i…

33550cc

…f exact match

refactor: remove pydantic from Cell class

0fd5923

test: fix expected time column in rawdata and result

02e26e3

tomjholland added refactor Refactoring existing code without significantly changing functionality breaking A breaking change labels Mar 29, 2026

style: rename column module to columns

5b6c81a

tomjholland changed the title ~~Implement Battery Data Format (BDF) column system~~ Implement Battery Data Format (BDF) column system and BDX format Mar 30, 2026

tomjholland added 13 commits March 30, 2026 17:51

refactor: update name format for cycling summary

ce03f56

test(novonix): fix novonix process_cycler integration test

fec8f3d

docs: update examples for new api

2603dd2

fix: return self from rawdata.zero_column method

4e8c15d

fix: use f-strings in logs

3adb397

fix: use Step Count rather than Step Index for step() filter

857fb4f

fix(rawdata): zero_column() method now returns a new result object an…

e281542

…d uses resolve()

fix(dashboard): update dashboard for new api and bdf columns

d130273

feat(columns): support ColumnSet in resolution methods and improve repr

d08c734

allow Column.resolve and Column.can_resolve to accept ColumnSet objects directly. update the ColumnSet string representation to list column names and identify BDF-standard columns.

docs: add example for Columns and BDF

ea9f469

docs: fix makefile

79a07f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Battery Data Format (BDF) column system and BDX format#373

Implement Battery Data Format (BDF) column system and BDX format#373
tomjholland wants to merge 40 commits intomainfrom
implement-bdf-columns

tomjholland commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tomjholland commented Mar 29, 2026

Summary

Breaking changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant