Skip to content

PERF: add no_nans fast path in nancorr#65046

Merged
mroeschke merged 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-algos-3
Apr 3, 2026
Merged

PERF: add no_nans fast path in nancorr#65046
mroeschke merged 1 commit intopandas-dev:mainfrom
jbrockmendel:perf-algos-3

Conversation

@jbrockmendel
Copy link
Copy Markdown
Member

Summary

Test plan

  • pytest pandas/tests/frame/methods/test_cov_corr.py — 78 passed

🤖 Generated with Claude Code

When all values are finite, skip per-element mask checks in the inner
loop of nancorr. This mirrors the existing pattern in nancorr_spearman.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jbrockmendel jbrockmendel added the Performance Memory or execution speed performance label Apr 3, 2026
@jbrockmendel jbrockmendel marked this pull request as ready for review April 3, 2026 03:47

result = np.empty((K, K), dtype=np.float64)
mask = np.isfinite(mat).view(np.uint8)
no_nans = np.asarray(mask).all()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How significant is the performance improvement checking this outside the loop?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing about 7% at the top end. I also like the fact that this matches the pattern we use in nancorr_spearman

@mroeschke mroeschke added this to the 3.1 milestone Apr 3, 2026
@mroeschke mroeschke merged commit 9947093 into pandas-dev:main Apr 3, 2026
49 of 51 checks passed
@mroeschke
Copy link
Copy Markdown
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the perf-algos-3 branch April 3, 2026 17:02
Sharl0tteIsTaken added a commit to Sharl0tteIsTaken/pandas that referenced this pull request Apr 4, 2026
…h-origin

* upstream/main: (83 commits)
  PERF: minor optimizations in _libs.algos (pandas-dev#65054)
  TST: Add roundtrip regression test for to_csv with embedded newlines (pandas-dev#65058)
  DOC: fix typo in timeseries documentation (ns -> us) (pandas-dev#65070)
  PERF: use searchsorted in IntervalIndex.get_indexer for monotonic indexes (pandas-dev#64786)
  PERF : Speed up `merge(sort=False)` for range-like unique integer join keys (left/right joins) (pandas-dev#64148)
  BUG: exclude RangeIndex from _can_use_libjoin, fix _union sorting (pandas-dev#64797)
  PERF: read SAS page headers in Cython instead of Python (pandas-dev#64769)
  BUG: date_range with start==end and inclusive="left"/"right" returns empty (pandas-dev#65014)
  PERF: restore hash table pre-allocation in value_count (pandas-dev#65027)
  TST: Add regression test for concat with tz_convert MonthStart index (pandas-dev#65019)
  BUG: DatetimeIndex._is_comparable_dtype raises AttributeError on Arrow date types (pandas-dev#64953)
  PERF: avoid expensive DataFrame snapshot in loc.__setitem__ (pandas-dev#65028)
  BUG: Fix AssertionError in replace with out-of-bounds datetime (pandas-dev#65009)
  PERF: add no_nans fast path in nancorr (pandas-dev#65046)
  API/DOC: remove iterability requirement for file-like object + clarify requirements in IO docstrings (pandas-dev#64986)
  BUG: date_range with periods=1 raises for offsets that disallow n=0 (pandas-dev#65011)
  CLN: remove using_cow parameter from _iLocIndexer._align_series (pandas-dev#65036)
  PERF: avoid redundant category re-validation in Categorical._simple_new (pandas-dev#65042)
  PERF: fix O(n) linear scan in _bin_search for ObjectEngine.get_loc (pandas-dev#65016)
  CLN: py2 leftover comments mostly (pandas-dev#65025)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants