Skip to content

Fix #1118: Update stumpy to >=1.11.1 to fix Python 3.11+ uv resolutio…#1120

Open
divyansh0-ai wants to merge 8 commits intoblue-yonder:mainfrom
divyansh0-ai:fix-python311-uv-install-1118
Open

Fix #1118: Update stumpy to >=1.11.1 to fix Python 3.11+ uv resolutio…#1120
divyansh0-ai wants to merge 8 commits intoblue-yonder:mainfrom
divyansh0-ai:fix-python311-uv-install-1118

Conversation

@divyansh0-ai
Copy link
Copy Markdown

This PR resolves the dependency resolution issues on Python 3.11+ (Issue #1118) and fixes a few initial Pandas 3.0 regression test failures.
Changes Made:
1.setup.cfg: Bumped stumpy minimum version from >=1.7.2 to >=1.11.1 to fix uv dependency resolution failures on Python 3.11+ (resolves #1118).

2.setup.py & pyproject.toml: Added a standard pyproject.toml exposing pyscaffold and setuptools<70 build requirements, and removed a legacy pkg_resources check in setup.py that caused isolated build managers (like uv or modern pip) to crash with ModuleNotFoundError.

3.tsfresh/utilities/dataframe_functions.py: Fixed a widespread KeyError: 'id' crash in the test suite caused by Pandas 3.0 groupby().apply() no longer including grouping keys in its evaluated chunks. Added fallback logic to dynamically restore missing grouping keys from the DataFrame's name tuple.

4.tests/units/utilities/test_dataframe_functions.py: Updated deprecated Pandas dataframe frequency string aliases (changed freq="H" to freq="h") to fix immediate Pandas 3.0 deprecation failures.

image

…v resolution and resolve Pandas 3.0 test failures
Copilot AI review requested due to automatic review settings April 5, 2026 19:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Python 3.11+ dependency resolution/build isolation issues (Issue #1118) and updates code/tests for Pandas 3.0 behavior changes impacting grouping/apply semantics and deprecated frequency aliases.

Changes:

  • Bump stumpy minimum version to avoid older numba/llvmlite constraints on newer Python versions.
  • Add pyproject.toml build-system requirements and simplify setup.py to support isolated build frontends (e.g., uv / modern pip).
  • Fix Pandas 3.0 groupby().apply() behavior changes by restoring missing grouping key columns, and update tests to use non-deprecated pandas frequency aliases.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
setup.cfg Raises stumpy minimum version to improve dependency resolution on Python 3.11+.
pyproject.toml Declares PEP 517 build-system requirements for isolated builds.
setup.py Removes legacy pkg_resources/setuptools version check that can break isolated build managers.
tsfresh/utilities/dataframe_functions.py Restores missing grouping columns in groupby/apply chunks to avoid Pandas 3.0 KeyErrors.
tests/units/utilities/test_dataframe_functions.py Updates deprecated pandas datetime frequency aliases ("H""h").

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +339 to +345
if grouper is not None:
for col in grouper:
if col not in df_temp.columns:
if len(grouper) > 1:
df_temp[col] = x.name[grouper.index(col)]
else:
df_temp[col] = x.name
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grouping-key restoration logic inside _roll_out_time_series._f duplicates similar logic added later in add_sub_time_series_index and also does an O(n^2) lookup via grouper.index(col) inside a loop. Consider factoring this into a small helper (or iterating with for i, col in enumerate(grouper): ...) so the mapping from grouper position to name tuple index is computed once and shared.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The O(n2) is not so important here, but I think the code is also cleaner. Having a dedicated helper function also makes sense 👍

@@ -0,0 +1,3 @@
[build-system]
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyproject.toml pins setuptools<70 as a build requirement. A hard upper bound can unnecessarily block builds on environments that have already moved to newer setuptools. If this cap is required due to a known incompatibility (e.g., PyScaffold <4), please add a brief rationale/comment or link; otherwise consider relaxing/removing the upper bound.

Suggested change
[build-system]
[build-system]
# Keep setuptools below 70 while using PyScaffold < 4, which has known
# compatibility constraints with newer setuptools releases.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mentioning that this is a requirement from pyscaffold is indeed a good idea

Comment on lines +734 to +739
for col in grouper:
if col not in df_chunk.columns:
if len(grouper) > 1:
df_chunk[col] = df_chunk.name[grouper.index(col)]
else:
df_chunk[col] = df_chunk.name
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This grouping-key restoration block duplicates the one added in _roll_out_time_series and uses repeated grouper.index(col) lookups inside a loop. Consider extracting to a helper / enumerating grouper once to avoid duplication and reduce repeated list scans.

Copilot uses AI. Check for mistakes.
nils-braun and others added 2 commits April 8, 2026 21:36
…ndex lookup

- Add module-level _restore_grouping_keys(df_chunk, grouper) helper that
  uses enumerate(grouper) to map each column to its name-tuple index in
  a single pass, eliminating repeated grouper.index(col) list scans.
- Replace the duplicated inline restoration blocks in both
  _roll_out_time_series._f and add_sub_time_series_index._add_id_column
  with a single call to the new helper.
- Add rationale comment to pyproject.toml explaining the setuptools<70
  upper bound (PyScaffold <4 incompatibility).

Addresses Copilot review comments on PR blue-yonder#1120.
Copy link
Copy Markdown
Collaborator

@nils-braun nils-braun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @divyansh0-ai for your time. Copilots suggestions were already quite reasonable. Can you implement those as well?

Comment on lines +339 to +345
if grouper is not None:
for col in grouper:
if col not in df_temp.columns:
if len(grouper) > 1:
df_temp[col] = x.name[grouper.index(col)]
else:
df_temp[col] = x.name
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The O(n2) is not so important here, but I think the code is also cleaner. Having a dedicated helper function also makes sense 👍

@@ -0,0 +1,3 @@
[build-system]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mentioning that this is a requirement from pyscaffold is indeed a good idea

divyansh0-ai added 3 commits April 9, 2026 01:42
1. Pass x.name (the group's name) rather than df_temp.name (the slice
   which has no .name) to _restore_grouping_keys in _roll_out_time_series.
   This fixes AttributeError on all roll_time_series tests.

2. Guard the _restore_grouping_keys call in _add_id_column with
   if grouper: to avoid calling df_chunk.name on a plain DataFrame
   (the no-grouper / else branch). Fixes test_no_parameters.

3. Move df['sort'] = range(df.shape[0]) to before df.groupby() so
   the synthetic sort column is visible inside every group chunk.
   Fixes KeyError: 'sort' in test_dict_rolling / test_dict_rolling_maxshift_1.

4. Update test_no_finite_values_yields_0 to use assertIn instead of
   assertEqual for the warning message, making it robust to the Pandas
   3.0 Arrow backend ArrowStringArray repr of column name arrays.

All 35 tests in test_dataframe_functions now pass on Python 3.11.
- feature_extraction/data.py: Fix KeyError in Dask groupby apply() by
  capturing x.name BEFORE sort_values() (which drops .name on the
  returned DataFrame), recovering id/kind from the group tuple for
  both with-kind and without-kind (melt+groupby) paths.

- feature_selection/relevance.py: Fix infer_ml_task() to recognize
  pd.StringDtype (Pandas 3.0 Arrow-backed strings) as classification,
  not regression.

- scripts/run_tsfresh.py: Replace deprecated delim_whitespace=True
  with sep=r'\s+' (removed in Pandas 3.0).

- examples/har_dataset.py: Same delim_whitespace -> sep=r'\s+' fix.

- tests/units/feature_selection/test_relevance.py:
  - Use .iloc[0] instead of [0] for Series with named index (Pandas 3.0
    no longer allows integer fallback on non-integer indexes).
  - Fix ChainedAssignmentError in test fixtures with .loc[cond, col]

- tests/units/transformers/test_feature_selector.py:
  - Fix ChainedAssignmentError with .loc[cond, col]

- tests/units/transformers/test_per_column_imputer.py:
  - Use assertIn() instead of assertEqual() for warning messages to
    handle ArrowStringArray repr differences in Pandas 3.0.

- tests/integrations/examples/test_robot_execution_failures.py:
  - Allow pd.StringDtype in addition to object dtype for string Series.
@divyansh0-ai
Copy link
Copy Markdown
Author

@nils-braun Thanks for the review and the feedback! I've gone ahead and implemented the suggestions, along with the remaining Pandas 3.0 compatibility fixes. The PR is ready for another look. Could you please approve the workflow runs when you have a moment.

Copy link
Copy Markdown
Collaborator

@nils-braun nils-braun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! Thanks for fixing all this!
I think there are just a few linting issues left.

Update: apart from the missing pd import there are also other two test failures. Do you want to look into them as well @divyansh0-ai or shall I take a look? One looks like a grouping key issue.


assert len(y.unique()) > 2
assert y.dtype == object
assert y.dtype == object or isinstance(y.dtype, pd.StringDtype)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably want to import pd here :)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nils-braun Thanks for the approval! I will gladly take care of these last few items. I’ll fix the missing pd import, clear up the linting, and dig into those two 3.11/3.12 test failures today. I'll push an update shortly!

divyansh0-ai pushed a commit to divyansh0-ai/tsfresh that referenced this pull request Apr 10, 2026
…r#1120

- tests/integrations/examples/test_robot_execution_failures.py:
  Add missing import pandas as pd needed for pd.StringDtype usage
  on line 57 (flagged by nils-braun in review).

- tsfresh/feature_selection/relevance.py:
  Reformat line in infer_ml_task to comply with black 22.12.0 line
  length limit (pre-commit failure).
  Wrap lambda ternary in calculate_relevance_table in extra parens to
  satisfy black's updated style rules.

- tsfresh/utilities/dataframe_functions.py:
  In �dd_sub_time_series_index, replace set_index(get_level_values(-1))
  with reset_index(drop=True) and pass include_groups=False (with a
  Pandas<2.2 fallback) to groupby().apply().  This prevents a grouping-
  key KeyError (KeyError: 'id') on Python 3.11+ with Pandas 3.0, where
  groupby columns are no longer present in the applied chunk and group_keys
  defaults to False, eliminating the MultiIndex that get_level_values(-1)
  previously relied on.
divyansh0-ai pushed a commit to divyansh0-ai/tsfresh that referenced this pull request Apr 10, 2026
…r#1120

- tests/integrations/examples/test_robot_execution_failures.py:
  Add missing import pandas as pd needed for pd.StringDtype usage
  on line 57 (flagged by nils-braun in review).

- tsfresh/feature_selection/relevance.py:
  Reformat line in infer_ml_task to comply with black 22.12.0 line
  length limit (pre-commit failure).
  Wrap lambda ternary in calculate_relevance_table in extra parens to
  satisfy black's updated style rules.

- tsfresh/utilities/dataframe_functions.py:
  In add_sub_time_series_index, replace set_index(get_level_values(-1))
  with reset_index(drop=True) and pass include_groups=False (with a
  Pandas<2.2 fallback) to groupby().apply(). This prevents a grouping-
  key KeyError on Python 3.11+ with Pandas 3.0, where groupby columns
  are no longer present in the applied chunk and group_keys defaults to
  False, eliminating the MultiIndex that get_level_values(-1) relied on.
@divyansh0-ai divyansh0-ai force-pushed the fix-python311-uv-install-1118 branch 2 times, most recently from 96ebf50 to 421fe2b Compare April 10, 2026 19:50
divyansh0-ai added 2 commits April 11, 2026 01:28
In add_sub_time_series_index, groupby().apply() now passes
include_groups=False (Pandas 2.2+) so grouping columns are explicitly
excluded from each chunk, matching Pandas 3.0 default behaviour.
_restore_grouping_keys() re-adds them via df_chunk.name.

Also replaced set_index(get_level_values(-1)) with reset_index(drop=True)
since group_keys now defaults to False in Pandas 3.0, meaning the
groupby result no longer has a MultiIndex.

Fixes KeyError on Python 3.11+ / Pandas 3.0 in Test (3.11) and Test (3.12).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Doesn't work on python 3.11+ in contradiction of what is specified

3 participants