Skip to content

util: Extend string search with user-defined printable characters#5998

Open
cheese-cakee wants to merge 4 commits intorizinorg:devfrom
cheese-cakee:extend-string-search
Open

util: Extend string search with user-defined printable characters#5998
cheese-cakee wants to merge 4 commits intorizinorg:devfrom
cheese-cakee:extend-string-search

Conversation

@cheese-cakee
Copy link

@cheese-cakee cheese-cakee commented Mar 4, 2026

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository
  • I made sure to follow the project's coding style
  • I've documented or updated the documentation of every function and struct/union this PR changes (Doxygen-style inline docs)
  • I've added tests that prove my fix is effective or that my feature works
  • I've updated the relevant documentation or made a note that it needs updating

Description

Closes #4930

Extend string search to support:

  1. User-defined non-printable characters via a new str.unprintable config option (comma-separated hex code points). This adds a new rz_unicode_code_point_is_printable_user() function that checks user overrides before falling back to the standard printability check.

  2. Grapheme cluster counting in search hits via a new grapheme_length field in RzDetectedString. This counts visual characters (base + combining marks) rather than raw code points.

Changes made

  • Add rz_unicode_code_point_is_printable_user() in librz/util/unicode.c
  • Add str.unprintable config option in librz/core/cconfig.c
  • Add user_unprintable / user_unprintable_count to RzUtilStrScanOptions and RzStrStringifyOpt
  • Add grapheme_length to RzDetectedString
  • Add count_graphemes() helper in str_search.c
  • Wire user-defined unprintable through all callers that set up RzUtilStrScanOptions
  • Add unit tests for both features

Testing

All existing and new unit tests pass:

1/1 rizin:unit / str_search OK    1.73s
Ok: 1 | Fail: 0 | Skipped: 0

Add str.unprintable config option to define code points treated as
non-printable during string search and display. Add grapheme_length
field to RzDetectedString for visual character count.

* Add rz_unicode_code_point_is_printable_user() for user overrides
* Add str.unprintable config option (comma-separated hex code points)
* Add grapheme_length to RzDetectedString for visual character count
* Add unit tests for user-defined unprintable and grapheme counting
* Wire user_unprintable through RzUtilStrScanOptions and RzStrStringifyOpt

All unit tests pass
@cheese-cakee cheese-cakee marked this pull request as ready for review March 4, 2026 22:38
@cheese-cakee cheese-cakee marked this pull request as draft March 4, 2026 22:40
@cheese-cakee cheese-cakee marked this pull request as ready for review March 5, 2026 21:55
@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 51.92308% with 50 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.12%. Comparing base (d6f278e) to head (f49a5e7).

Files with missing lines Patch % Lines
librz/core/cconfig.c 18.75% 37 Missing and 2 partials ⚠️
librz/util/str_search.c 77.41% 4 Missing and 3 partials ⚠️
librz/util/str.c 42.85% 2 Missing and 2 partials ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
librz/bin/bfile_string.c 61.86% <100.00%> (+0.06%) ⬆️
librz/bin/bin.c 59.63% <100.00%> (+0.04%) ⬆️
librz/core/canalysis.c 61.02% <100.00%> (+0.02%) ⬆️
librz/core/cmeta.c 70.09% <100.00%> (+0.20%) ⬆️
librz/core/csearch.c 48.57% <100.00%> (+0.36%) ⬆️
librz/include/rz_bin.h 43.47% <ø> (ø)
librz/include/rz_util/rz_str.h 60.00% <ø> (ø)
librz/util/unicode.c 86.92% <100.00%> (+0.44%) ⬆️
librz/util/str.c 55.79% <42.85%> (-0.06%) ⬇️
librz/util/str_search.c 86.52% <77.41%> (-0.82%) ⬇️
... and 1 more

... and 8 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d6f278e...f49a5e7. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cheese-cakee added a commit to cheese-cakee/rizin that referenced this pull request Mar 6, 2026
The RzSearchOpt and RzBuffer instances were not freed at the end of test cases, which caused an LSan test failure in CI.

Fixes rizinorg#5998

Co-Authored-By: Claude <[email protected]>
The RzSearchOpt and RzBuffer instances were not freed at the end of test cases, which caused an LSan test failure in CI.

Fixes rizinorg#5998
@cheese-cakee cheese-cakee force-pushed the extend-string-search branch from 128502d to 2accda4 Compare March 6, 2026 20:14
Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs some regression tests in test/db/.... Please add some, with multiple characters, autf-8 chars, only a single one etc.

if (node->value[0] == '?') {
rz_cons_printf("Comma-separated list of Unicode code points treated as non-printable.\n");
rz_cons_printf("Examples:\n");
rz_cons_printf(" e str.unprintable=0x09,0x0a,0x0d,0x1b\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an example with an UTF-8 character please. E.g. a formatting character.
Also a test.

ut64 addr; ///< Address/offset of the string in the RzBuffer
ut32 size; ///< Size of buffer containing the string in bytes
ut32 length; ///< Length of string in chars
ut32 grapheme_length; ///< Length of string in grapheme clusters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't introduce graphemes. It is too complex. People should only define code points, that's it.

@Rot127
Copy link
Member

Rot127 commented Mar 7, 2026

Also, why did you remove the AI disclaimer of yours?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend string search

2 participants