Fix #1118: Update stumpy to >=1.11.1 to fix Python 3.11+ uv resolutio…#1120
Fix #1118: Update stumpy to >=1.11.1 to fix Python 3.11+ uv resolutio…#1120divyansh0-ai wants to merge 8 commits intoblue-yonder:mainfrom
Conversation
…v resolution and resolve Pandas 3.0 test failures
There was a problem hiding this comment.
Pull request overview
This PR addresses Python 3.11+ dependency resolution/build isolation issues (Issue #1118) and updates code/tests for Pandas 3.0 behavior changes impacting grouping/apply semantics and deprecated frequency aliases.
Changes:
- Bump
stumpyminimum version to avoid oldernumba/llvmliteconstraints on newer Python versions. - Add
pyproject.tomlbuild-system requirements and simplifysetup.pyto support isolated build frontends (e.g., uv / modern pip). - Fix Pandas 3.0
groupby().apply()behavior changes by restoring missing grouping key columns, and update tests to use non-deprecated pandas frequency aliases.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
setup.cfg |
Raises stumpy minimum version to improve dependency resolution on Python 3.11+. |
pyproject.toml |
Declares PEP 517 build-system requirements for isolated builds. |
setup.py |
Removes legacy pkg_resources/setuptools version check that can break isolated build managers. |
tsfresh/utilities/dataframe_functions.py |
Restores missing grouping columns in groupby/apply chunks to avoid Pandas 3.0 KeyErrors. |
tests/units/utilities/test_dataframe_functions.py |
Updates deprecated pandas datetime frequency aliases ("H" → "h"). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if grouper is not None: | ||
| for col in grouper: | ||
| if col not in df_temp.columns: | ||
| if len(grouper) > 1: | ||
| df_temp[col] = x.name[grouper.index(col)] | ||
| else: | ||
| df_temp[col] = x.name |
There was a problem hiding this comment.
The grouping-key restoration logic inside _roll_out_time_series._f duplicates similar logic added later in add_sub_time_series_index and also does an O(n^2) lookup via grouper.index(col) inside a loop. Consider factoring this into a small helper (or iterating with for i, col in enumerate(grouper): ...) so the mapping from grouper position to name tuple index is computed once and shared.
There was a problem hiding this comment.
The O(n2) is not so important here, but I think the code is also cleaner. Having a dedicated helper function also makes sense 👍
| @@ -0,0 +1,3 @@ | |||
| [build-system] | |||
There was a problem hiding this comment.
pyproject.toml pins setuptools<70 as a build requirement. A hard upper bound can unnecessarily block builds on environments that have already moved to newer setuptools. If this cap is required due to a known incompatibility (e.g., PyScaffold <4), please add a brief rationale/comment or link; otherwise consider relaxing/removing the upper bound.
| [build-system] | |
| [build-system] | |
| # Keep setuptools below 70 while using PyScaffold < 4, which has known | |
| # compatibility constraints with newer setuptools releases. |
There was a problem hiding this comment.
I think mentioning that this is a requirement from pyscaffold is indeed a good idea
| for col in grouper: | ||
| if col not in df_chunk.columns: | ||
| if len(grouper) > 1: | ||
| df_chunk[col] = df_chunk.name[grouper.index(col)] | ||
| else: | ||
| df_chunk[col] = df_chunk.name |
There was a problem hiding this comment.
This grouping-key restoration block duplicates the one added in _roll_out_time_series and uses repeated grouper.index(col) lookups inside a loop. Consider extracting to a helper / enumerating grouper once to avoid duplication and reduce repeated list scans.
Co-authored-by: Copilot <[email protected]>
…ndex lookup - Add module-level _restore_grouping_keys(df_chunk, grouper) helper that uses enumerate(grouper) to map each column to its name-tuple index in a single pass, eliminating repeated grouper.index(col) list scans. - Replace the duplicated inline restoration blocks in both _roll_out_time_series._f and add_sub_time_series_index._add_id_column with a single call to the new helper. - Add rationale comment to pyproject.toml explaining the setuptools<70 upper bound (PyScaffold <4 incompatibility). Addresses Copilot review comments on PR blue-yonder#1120.
nils-braun
left a comment
There was a problem hiding this comment.
Thank you very much @divyansh0-ai for your time. Copilots suggestions were already quite reasonable. Can you implement those as well?
| if grouper is not None: | ||
| for col in grouper: | ||
| if col not in df_temp.columns: | ||
| if len(grouper) > 1: | ||
| df_temp[col] = x.name[grouper.index(col)] | ||
| else: | ||
| df_temp[col] = x.name |
There was a problem hiding this comment.
The O(n2) is not so important here, but I think the code is also cleaner. Having a dedicated helper function also makes sense 👍
| @@ -0,0 +1,3 @@ | |||
| [build-system] | |||
There was a problem hiding this comment.
I think mentioning that this is a requirement from pyscaffold is indeed a good idea
1. Pass x.name (the group's name) rather than df_temp.name (the slice which has no .name) to _restore_grouping_keys in _roll_out_time_series. This fixes AttributeError on all roll_time_series tests. 2. Guard the _restore_grouping_keys call in _add_id_column with if grouper: to avoid calling df_chunk.name on a plain DataFrame (the no-grouper / else branch). Fixes test_no_parameters. 3. Move df['sort'] = range(df.shape[0]) to before df.groupby() so the synthetic sort column is visible inside every group chunk. Fixes KeyError: 'sort' in test_dict_rolling / test_dict_rolling_maxshift_1. 4. Update test_no_finite_values_yields_0 to use assertIn instead of assertEqual for the warning message, making it robust to the Pandas 3.0 Arrow backend ArrowStringArray repr of column name arrays. All 35 tests in test_dataframe_functions now pass on Python 3.11.
- feature_extraction/data.py: Fix KeyError in Dask groupby apply() by
capturing x.name BEFORE sort_values() (which drops .name on the
returned DataFrame), recovering id/kind from the group tuple for
both with-kind and without-kind (melt+groupby) paths.
- feature_selection/relevance.py: Fix infer_ml_task() to recognize
pd.StringDtype (Pandas 3.0 Arrow-backed strings) as classification,
not regression.
- scripts/run_tsfresh.py: Replace deprecated delim_whitespace=True
with sep=r'\s+' (removed in Pandas 3.0).
- examples/har_dataset.py: Same delim_whitespace -> sep=r'\s+' fix.
- tests/units/feature_selection/test_relevance.py:
- Use .iloc[0] instead of [0] for Series with named index (Pandas 3.0
no longer allows integer fallback on non-integer indexes).
- Fix ChainedAssignmentError in test fixtures with .loc[cond, col]
- tests/units/transformers/test_feature_selector.py:
- Fix ChainedAssignmentError with .loc[cond, col]
- tests/units/transformers/test_per_column_imputer.py:
- Use assertIn() instead of assertEqual() for warning messages to
handle ArrowStringArray repr differences in Pandas 3.0.
- tests/integrations/examples/test_robot_execution_failures.py:
- Allow pd.StringDtype in addition to object dtype for string Series.
|
@nils-braun Thanks for the review and the feedback! I've gone ahead and implemented the suggestions, along with the remaining Pandas 3.0 compatibility fixes. The PR is ready for another look. Could you please approve the workflow runs when you have a moment. |
There was a problem hiding this comment.
Well done! Thanks for fixing all this!
I think there are just a few linting issues left.
Update: apart from the missing pd import there are also other two test failures. Do you want to look into them as well @divyansh0-ai or shall I take a look? One looks like a grouping key issue.
|
|
||
| assert len(y.unique()) > 2 | ||
| assert y.dtype == object | ||
| assert y.dtype == object or isinstance(y.dtype, pd.StringDtype) |
There was a problem hiding this comment.
you probably want to import pd here :)
There was a problem hiding this comment.
@nils-braun Thanks for the approval! I will gladly take care of these last few items. I’ll fix the missing pd import, clear up the linting, and dig into those two 3.11/3.12 test failures today. I'll push an update shortly!
…r#1120 - tests/integrations/examples/test_robot_execution_failures.py: Add missing import pandas as pd needed for pd.StringDtype usage on line 57 (flagged by nils-braun in review). - tsfresh/feature_selection/relevance.py: Reformat line in infer_ml_task to comply with black 22.12.0 line length limit (pre-commit failure). Wrap lambda ternary in calculate_relevance_table in extra parens to satisfy black's updated style rules. - tsfresh/utilities/dataframe_functions.py: In �dd_sub_time_series_index, replace set_index(get_level_values(-1)) with reset_index(drop=True) and pass include_groups=False (with a Pandas<2.2 fallback) to groupby().apply(). This prevents a grouping- key KeyError (KeyError: 'id') on Python 3.11+ with Pandas 3.0, where groupby columns are no longer present in the applied chunk and group_keys defaults to False, eliminating the MultiIndex that get_level_values(-1) previously relied on.
…r#1120 - tests/integrations/examples/test_robot_execution_failures.py: Add missing import pandas as pd needed for pd.StringDtype usage on line 57 (flagged by nils-braun in review). - tsfresh/feature_selection/relevance.py: Reformat line in infer_ml_task to comply with black 22.12.0 line length limit (pre-commit failure). Wrap lambda ternary in calculate_relevance_table in extra parens to satisfy black's updated style rules. - tsfresh/utilities/dataframe_functions.py: In add_sub_time_series_index, replace set_index(get_level_values(-1)) with reset_index(drop=True) and pass include_groups=False (with a Pandas<2.2 fallback) to groupby().apply(). This prevents a grouping- key KeyError on Python 3.11+ with Pandas 3.0, where groupby columns are no longer present in the applied chunk and group_keys defaults to False, eliminating the MultiIndex that get_level_values(-1) relied on.
96ebf50 to
421fe2b
Compare
In add_sub_time_series_index, groupby().apply() now passes include_groups=False (Pandas 2.2+) so grouping columns are explicitly excluded from each chunk, matching Pandas 3.0 default behaviour. _restore_grouping_keys() re-adds them via df_chunk.name. Also replaced set_index(get_level_values(-1)) with reset_index(drop=True) since group_keys now defaults to False in Pandas 3.0, meaning the groupby result no longer has a MultiIndex. Fixes KeyError on Python 3.11+ / Pandas 3.0 in Test (3.11) and Test (3.12).
This PR resolves the dependency resolution issues on Python 3.11+ (Issue #1118) and fixes a few initial Pandas 3.0 regression test failures.
Changes Made:
1.setup.cfg: Bumped stumpy minimum version from >=1.7.2 to >=1.11.1 to fix uv dependency resolution failures on Python 3.11+ (resolves #1118).
2.setup.py & pyproject.toml: Added a standard pyproject.toml exposing pyscaffold and setuptools<70 build requirements, and removed a legacy pkg_resources check in setup.py that caused isolated build managers (like uv or modern pip) to crash with ModuleNotFoundError.
3.tsfresh/utilities/dataframe_functions.py: Fixed a widespread KeyError: 'id' crash in the test suite caused by Pandas 3.0 groupby().apply() no longer including grouping keys in its evaluated chunks. Added fallback logic to dynamically restore missing grouping keys from the DataFrame's name tuple.
4.tests/units/utilities/test_dataframe_functions.py: Updated deprecated Pandas dataframe frequency string aliases (changed freq="H" to freq="h") to fix immediate Pandas 3.0 deprecation failures.