PERF: restore kh_get before kh_put in hashtable operations by jbrockmendel · Pull Request #65026 · pandas-dev/pandas

jbrockmendel · 2026-04-02T16:49:29Z

Summary

Partially reverts PERF: Eliminate redundant kh_get calls in hashtable operations #64543, which replaced kh_get + kh_put with kh_put alone in the hashtable _unique, get_labels_groupby, value_count, and duplicated methods
While kh_put avoids a redundant hash for new keys, it is more expensive than kh_get for existing keys due to extra branching in khash (resize checks, deleted-slot tracking)
For low-cardinality data (e.g. boolean with 2 unique values), nearly all lookups hit existing keys, causing ~80% regression in Factorize.time_factorize at the Cython level

Test plan

All existing test_algos.py and test_hashtable.py tests pass
algorithms.Factorize.time_factorize for boolean dtype matches v3.0 performance

🤖 Generated with Claude Code

Partially reverts pandas-dev#64543, which replaced kh_get + kh_put with kh_put alone. While kh_put avoids a redundant hash for new keys, it is more expensive than kh_get for existing keys due to extra branching (resize checks, deleted-slot tracking). For low-cardinality data (e.g. boolean with 2 unique values), nearly all lookups hit existing keys, causing ~80% regression at the Cython level. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jbrockmendel added the Performance Memory or execution speed performance label Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PERF: restore kh_get before kh_put in hashtable operations#65026

PERF: restore kh_get before kh_put in hashtable operations#65026
jbrockmendel wants to merge 1 commit intopandas-dev:mainfrom
jbrockmendel:regrs-8

jbrockmendel commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jbrockmendel commented Apr 2, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant