Skip to content

BUG: Different result from str.find depending on dtype #64123

@brsr

Description

@brsr

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
examples=['ga',
          'Áa',
          '永a',
          '🐍a']
for t in examples:
    objects = pd.Series([t], dtype=object)
    strs = pd.Series([t], dtype=str)
    print('python:', t.find('a'), 
          'pandas with object type:', objects.str.find('a')[0],
          'pandas with string type:', strs.str.find('a')[0],          
          )

Issue Description

str.find appears to be indexing its result based on bytes rather than characters. In the example, the basic python str type, and a Series using the object dtype, find('a') is 1 for all the example strings, as expected. With a Series using the new str dtype, the result is 1, 2, 3, and 4 respectively. Note that encoded in UTF-8, Á takes two bytes, 永 takes three, and 🐍 takes four, so I suspect it's counting bytes instead of characters. len gives a result of 2 regardless of dtype.

Expected Behavior

The result of str.find shouldn't change depending on dtype.

Installed Versions

Details

INSTALLED VERSIONS

commit : cc9e131
python : 3.13.12
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.26200
machine : AMD64
processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 3.1.0.dev0+140.gcc9e131258
numpy : 2.4.2
dateutil : 2.9.0.post0
pip : 26.0.1
Cython : None
sphinx : 9.1.0
IPython : 9.10.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.14.3
bottleneck : None
fastparquet : None
fsspec : None
html5lib : 1.1
hypothesis : None
gcsfs : None
jinja2 : 3.1.6
lxml.etree : 6.0.2
matplotlib : 3.10.8
numba : None
numexpr : None
odfpy : None
openpyxl : None
psycopg2 : None
pymysql : None
pyarrow : 23.0.0
pyiceberg : None
pyreadstat : None
pytest : 9.0.2
python-calamine : None
pytz : 2025.2
pyxlsb : None
s3fs : None
scipy : 1.17.0
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : 2026.1.0
xlrd : None
xlsxwriter : None
zstandard : None
qtpy : 2.4.3
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Arrowpyarrow functionalityBugStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions