Skip to content

Implement Battery Data Format (BDF) column system and BDX format#373

Draft
tomjholland wants to merge 40 commits intomainfrom
implement-bdf-columns
Draft

Implement Battery Data Format (BDF) column system and BDX format#373
tomjholland wants to merge 40 commits intomainfrom
implement-bdf-columns

Conversation

@tomjholland
Copy link
Copy Markdown
Collaborator

Closes #281

Summary

  • New BDF enum and Column/BDFColumn/ColumnSet classes (column.py): standardises 27 canonical battery quantities (e.g. Voltage / V, Current / A). ColumnSet resolves Polars column names with automatic unit conversion via Pint, replacing all hardcoded column string references. Closes Update column names to align with bdf #283, Use [pint](https://pint.readthedocs.io/en/stable/) for reliable unit conversions #284

  • Metadata written out to parquet file or json sidecar. Closes Write required bdf metadata alongside .parquet files #282

  • BDF Enum provides Column objects that can be used throughout the code for resolving BDF standard columns using recipes, whether or not they are in the underlying dataframe

  • New io.py module: replaces the old cycler-processor architecture with process_cycler() (file → Parquet) and process_generic() (arbitrary DataFrame → BDF). Adds timezone correction, temperature column import, glob pattern support, and a column_map override mechanism.

  • Simplified Cell API (cell.py): removes process_cycler_file(), process_generic_file(), import_data(), and import_from_cycler() in favour of a single add_procedure() method. Removes Pydantic from Cell.

  • Result.add_data() overhaul: joins on Unix time, consolidates timezone handling into a single timezone argument, and uses an explicit column_map dict instead of positional column lists.

  • Result.infoResult.metadata: breaking rename for clarity.

  • Procedure.load() factory method: simplified initialisation path.

  • Analysis modules updated throughout to use BDF enum references instead of raw strings.

Breaking changes

  • Column naming format changed from Quantity [unit] to Quantity / unit: all column names previously using square-bracket notation (e.g. "Current [A]", "Voltage [V]") must be updated to slash-separated BDF format (e.g. "Current / A", "Voltage / V"). The old units.py module and its split_quantity_unit() helper are removed; use BDF enum members or Column / ColumnSet instead.

  • Cell.process_cycler_file(), process_generic_file(), import_data(), import_from_cycler() removed — use add_procedure().

  • Result.info renamed to Result.metadata. Closes Change metadata to be a property of procedures rather than cells #286

  • Result.add_data() signature changed: date_column_nametime_column_name, importing_columnscolumn_map.

  • io.process_cycler() now returns Path instead of LazyFrame; several parameters renamed/removed (output_diroutput_path, write_parquet/metadata/metadata_format/extra_columns removed).

Merging requires closing battery-data-alliance/battery-data-format#6

…on capabilities

- Updated regex patterns for various column name formats.
- Added support for unit aliases and enhanced unit conversion logic.
- Introduced new methods for finding and resolving column names.
- Expanded test coverage for parsing, conversion, and resolution functionalities.
replace multi-format column parsing with bdf-standard descriptors. use recipes to derive columns
like net capacity from dependencies and columnset for context-aware resolution.
move resolution logic into column classes and replace global instances with a bdf enum. rename
column_name to name and add factory functions for column creation.
integrate BDF-aware column resolution and unit conversion into the Result class hierarchy via ColumnSet. replace hardcoded column strings with BDF enum references across filters and rawdata modules. simplify metadata management and remove automatic column zeroing from filtered objects. introduce a Procedure.load method for simplified initialization.
**Changes to io.py:**
- process_cycler() now returns Path instead of LazyFrame
- Renamed parameter: output_dir → output_path
- Removed parameters: write_parquet, metadata, metadata_format, extra_columns
- Added compression_priority parameter for Parquet compression control
- Added support for glob patterns in source parameter
- Added column_map parameter to override/extend auto-resolved BDF columns

**New functions:**
- process_generic(): Normalise arbitrary DataFrames to BDF format
- attach_metadata(): Update metadata on existing Parquet files

**New helper functions:**
- _resolve_glob(), _load_raw_dataframes(), _concat_dataframes()
- _handle_existing_cached_file(), _build_column_map_exprs(), _extract_column_map_columns()

**Tests:**
- Complete rewrite of test_io.py with 50+ tests
- New test classes: TestProcessCyclerOutputPath, TestProcessGeneric, TestHelperFunctions
- Tests for glob patterns, column_map, compression priorities
- Tests for attach_metadata and process_generic with DataFrame sources
…_procedure API

- New add_procedure() method
- Remove old process_cycler_file(), process_generic_file(), import_data(), import_from_cycler() methods (replaced by add_procedure())
- Add deprecated skeleton methods with @deprecated decorator that raise informative errors directing users to add_procedure()
- Remove module-level process_cycler_data() function
- Deprecate _cycler_dict module variable (cycler handling now internal to io module)
- Clean up test_cell.py: remove 12 obsolete test functions for removed methods
- Keep tests for add_procedure() (3 tests) and other public methods
- Update add_procedure() signature to handle both files and DataFrames with unified column_map parameter and optional output_path
- Simplify import handling by delegating to io.process_cycler() and
  io.process_generic() functions
…implified API

Standardize Result.add_data() to use "Unix Time / s" as the join key across all new data imports. Simplify the method signature by replacing date_column_name with time_column_name, consolidating timezone parameters into a single timezone argument, and replacing importing_columns with a more explicit column_map dict. Update align_data() and related tests to match. Reorganize test_result.py tests into logical classes for clarity.
@tomjholland tomjholland added refactor Refactoring existing code without significantly changing functionality breaking A breaking change labels Mar 29, 2026
@tomjholland tomjholland changed the title Implement Battery Data Format (BDF) column system Implement Battery Data Format (BDF) column system and BDX format Mar 30, 2026
Update Result and RawData methods to accept Column objects or BDF enum members in addition to strings. This enables type-safe column access across the API. ColumnSet now uses tuples for public properties and a set for internal storage to improve consistency.

BREAKING CHANGE: the definition parameter has been removed from RawData.zero_column. ColumnSet.names and ColumnSet.quantities now return tuples instead of lists.
allow Column.resolve and Column.can_resolve to accept ColumnSet objects directly. update the ColumnSet string representation to list column names and identify BDF-standard columns.
…ng interface

rename ColumnSet to ColumnDict and implement the Mapping interface for direct lookup by column name.
replace the previous set-based iteration in resolution methods with indexed lookups by name and
quantity to improve performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking A breaking change refactor Refactoring existing code without significantly changing functionality

Projects

None yet

1 participant