Skip to content

PERF: Avoid drop_duplicates(keep=False) in DataFrame.__getitem__ for single-column access#64127

Open
tuhinsharma121 wants to merge 6 commits intopandas-dev:mainfrom
tuhinsharma121:perf-drop_duplicates
Open

PERF: Avoid drop_duplicates(keep=False) in DataFrame.__getitem__ for single-column access#64127
tuhinsharma121 wants to merge 6 commits intopandas-dev:mainfrom
tuhinsharma121:perf-drop_duplicates

Conversation

@tuhinsharma121
Copy link
Contributor

@tuhinsharma121 tuhinsharma121 marked this pull request as ready for review February 13, 2026 16:09
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Could you add an ASV benchmark in asv_bench/benchmarks/indexing.py (showing the before and after performance) and add a whatsnew note in v3.1.0.rst?

@mroeschke mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Feb 13, 2026
@mroeschke mroeschke added this to the 3.1 milestone Feb 13, 2026
@tuhinsharma121
Copy link
Contributor Author

Perfect will do that

@tuhinsharma121
Copy link
Contributor Author

@mroeschke added the asv and whatsnew.

asv run --quick --dry-run --python=same --show-stderr -b DataFrameGetitemDuplicateColumns

on main gives

========= ==========
 ncols             
--------- ----------
  1000    6.23±0ms 
 10000    16.8±0ms 
 100000   108±0ms  
1000000   883±0ms  
========= ==========

and

this PR gives

========= ============
  ncols               
--------- ------------
   1000     701±40μs  
  10000     732±5μs   
  100000    927±4μs   
 1000000   2.50±0.1ms 
========= ============

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF: Avoid drop_duplicates(keep=False) in DataFrame.__getitem__ for single-column access

2 participants