Implement Battery Data Format (BDF) column system and BDX format#373
Draft
tomjholland wants to merge 40 commits intomainfrom
Draft
Implement Battery Data Format (BDF) column system and BDX format#373tomjholland wants to merge 40 commits intomainfrom
tomjholland wants to merge 40 commits intomainfrom
Conversation
…on capabilities - Updated regex patterns for various column name formats. - Added support for unit aliases and enhanced unit conversion logic. - Introduced new methods for finding and resolving column names. - Expanded test coverage for parsing, conversion, and resolution functionalities.
replace multi-format column parsing with bdf-standard descriptors. use recipes to derive columns like net capacity from dependencies and columnset for context-aware resolution.
move resolution logic into column classes and replace global instances with a bdf enum. rename column_name to name and add factory functions for column creation.
integrate BDF-aware column resolution and unit conversion into the Result class hierarchy via ColumnSet. replace hardcoded column strings with BDF enum references across filters and rawdata modules. simplify metadata management and remove automatic column zeroing from filtered objects. introduce a Procedure.load method for simplified initialization.
**Changes to io.py:** - process_cycler() now returns Path instead of LazyFrame - Renamed parameter: output_dir → output_path - Removed parameters: write_parquet, metadata, metadata_format, extra_columns - Added compression_priority parameter for Parquet compression control - Added support for glob patterns in source parameter - Added column_map parameter to override/extend auto-resolved BDF columns **New functions:** - process_generic(): Normalise arbitrary DataFrames to BDF format - attach_metadata(): Update metadata on existing Parquet files **New helper functions:** - _resolve_glob(), _load_raw_dataframes(), _concat_dataframes() - _handle_existing_cached_file(), _build_column_map_exprs(), _extract_column_map_columns() **Tests:** - Complete rewrite of test_io.py with 50+ tests - New test classes: TestProcessCyclerOutputPath, TestProcessGeneric, TestHelperFunctions - Tests for glob patterns, column_map, compression priorities - Tests for attach_metadata and process_generic with DataFrame sources
…_procedure API - New add_procedure() method - Remove old process_cycler_file(), process_generic_file(), import_data(), import_from_cycler() methods (replaced by add_procedure()) - Add deprecated skeleton methods with @deprecated decorator that raise informative errors directing users to add_procedure() - Remove module-level process_cycler_data() function - Deprecate _cycler_dict module variable (cycler handling now internal to io module) - Clean up test_cell.py: remove 12 obsolete test functions for removed methods - Keep tests for add_procedure() (3 tests) and other public methods - Update add_procedure() signature to handle both files and DataFrames with unified column_map parameter and optional output_path - Simplify import handling by delegating to io.process_cycler() and io.process_generic() functions
…implified API Standardize Result.add_data() to use "Unix Time / s" as the join key across all new data imports. Simplify the method signature by replacing date_column_name with time_column_name, consolidating timezone parameters into a single timezone argument, and replacing importing_columns with a more explicit column_map dict. Update align_data() and related tests to match. Reorganize test_result.py tests into logical classes for clarity.
Update Result and RawData methods to accept Column objects or BDF enum members in addition to strings. This enables type-safe column access across the API. ColumnSet now uses tuples for public properties and a set for internal storage to improve consistency. BREAKING CHANGE: the definition parameter has been removed from RawData.zero_column. ColumnSet.names and ColumnSet.quantities now return tuples instead of lists.
allow Column.resolve and Column.can_resolve to accept ColumnSet objects directly. update the ColumnSet string representation to list column names and identify BDF-standard columns.
…ng interface rename ColumnSet to ColumnDict and implement the Mapping interface for direct lookup by column name. replace the previous set-based iteration in resolution methods with indexed lookups by name and quantity to improve performance.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #281
Summary
New
BDFenum andColumn/BDFColumn/ColumnSetclasses (column.py): standardises 27 canonical battery quantities (e.g.Voltage / V,Current / A).ColumnSetresolves Polars column names with automatic unit conversion via Pint, replacing all hardcoded column string references. Closes Update column names to align with bdf #283, Use [pint](https://pint.readthedocs.io/en/stable/) for reliable unit conversions #284Metadata written out to parquet file or json sidecar. Closes Write required bdf metadata alongside
.parquetfiles #282BDF Enum provides Column objects that can be used throughout the code for resolving BDF standard columns using recipes, whether or not they are in the underlying dataframe
New
io.pymodule: replaces the old cycler-processor architecture withprocess_cycler()(file → Parquet) andprocess_generic()(arbitrary DataFrame → BDF). Adds timezone correction, temperature column import, glob pattern support, and acolumn_mapoverride mechanism.Simplified
CellAPI (cell.py): removesprocess_cycler_file(),process_generic_file(),import_data(), andimport_from_cycler()in favour of a singleadd_procedure()method. Removes Pydantic fromCell.Result.add_data()overhaul: joins on Unix time, consolidates timezone handling into a singletimezoneargument, and uses an explicitcolumn_mapdict instead of positional column lists.Result.info→Result.metadata: breaking rename for clarity.Procedure.load()factory method: simplified initialisation path.Analysis modules updated throughout to use
BDFenum references instead of raw strings.Breaking changes
Column naming format changed from
Quantity [unit]toQuantity / unit: all column names previously using square-bracket notation (e.g."Current [A]","Voltage [V]") must be updated to slash-separated BDF format (e.g."Current / A","Voltage / V"). The oldunits.pymodule and itssplit_quantity_unit()helper are removed; useBDFenum members orColumn/ColumnSetinstead.Cell.process_cycler_file(),process_generic_file(),import_data(),import_from_cycler()removed — useadd_procedure().Result.inforenamed toResult.metadata. Closes Change metadata to be a property of procedures rather than cells #286Result.add_data()signature changed:date_column_name→time_column_name,importing_columns→column_map.io.process_cycler()now returnsPathinstead ofLazyFrame; several parameters renamed/removed (output_dir→output_path,write_parquet/metadata/metadata_format/extra_columnsremoved).Merging requires closing battery-data-alliance/battery-data-format#6