util: Extend string search with user-defined printable characters#5998
util: Extend string search with user-defined printable characters#5998cheese-cakee wants to merge 4 commits intorizinorg:devfrom
Conversation
Add str.unprintable config option to define code points treated as non-printable during string search and display. Add grapheme_length field to RzDetectedString for visual character count. * Add rz_unicode_code_point_is_printable_user() for user overrides * Add str.unprintable config option (comma-separated hex code points) * Add grapheme_length to RzDetectedString for visual character count * Add unit tests for user-defined unprintable and grapheme counting * Wire user_unprintable through RzUtilStrScanOptions and RzStrStringifyOpt All unit tests pass
Codecov Report❌ Patch coverage is Additional details and impacted files
... and 8 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
The RzSearchOpt and RzBuffer instances were not freed at the end of test cases, which caused an LSan test failure in CI. Fixes rizinorg#5998 Co-Authored-By: Claude <[email protected]>
The RzSearchOpt and RzBuffer instances were not freed at the end of test cases, which caused an LSan test failure in CI. Fixes rizinorg#5998
128502d to
2accda4
Compare
Rot127
left a comment
There was a problem hiding this comment.
It needs some regression tests in test/db/.... Please add some, with multiple characters, autf-8 chars, only a single one etc.
| if (node->value[0] == '?') { | ||
| rz_cons_printf("Comma-separated list of Unicode code points treated as non-printable.\n"); | ||
| rz_cons_printf("Examples:\n"); | ||
| rz_cons_printf(" e str.unprintable=0x09,0x0a,0x0d,0x1b\n"); |
There was a problem hiding this comment.
Add an example with an UTF-8 character please. E.g. a formatting character.
Also a test.
| ut64 addr; ///< Address/offset of the string in the RzBuffer | ||
| ut32 size; ///< Size of buffer containing the string in bytes | ||
| ut32 length; ///< Length of string in chars | ||
| ut32 grapheme_length; ///< Length of string in grapheme clusters |
There was a problem hiding this comment.
Don't introduce graphemes. It is too complex. People should only define code points, that's it.
|
Also, why did you remove the AI disclaimer of yours? |
Your checklist for this pull request
Description
Closes #4930
Extend string search to support:
User-defined non-printable characters via a new
str.unprintableconfig option (comma-separated hex code points). This adds a newrz_unicode_code_point_is_printable_user()function that checks user overrides before falling back to the standard printability check.Grapheme cluster counting in search hits via a new
grapheme_lengthfield inRzDetectedString. This counts visual characters (base + combining marks) rather than raw code points.Changes made
rz_unicode_code_point_is_printable_user()inlibrz/util/unicode.cstr.unprintableconfig option inlibrz/core/cconfig.cuser_unprintable/user_unprintable_counttoRzUtilStrScanOptionsandRzStrStringifyOptgrapheme_lengthtoRzDetectedStringcount_graphemes()helper instr_search.cRzUtilStrScanOptionsTesting
All existing and new unit tests pass: