@@ -18,27 +18,39 @@ may break at any time without the MAJOR version number being incremented.
1818The table below compares the single threaded throughput in bytes/s (real time) between
1919libhat and [ two other] ( test/benchmark/vendor ) commonly used implementations for pattern
2020scanning. The input buffers were randomly generated using a fixed seed, and the pattern
21- scanned does not contain any match in the buffer. The benchmark was run on a system with
22- an i7-9700K (which supports libhat's [ AVX2] ( src/arch/x86/AVX2.cpp ) scanner implementation).
21+ scanned does not contain any match in the buffer. The benchmark was compiled on Windows
22+ with ` clang-cl ` 21.1.1, using the MSVC 14.44.35207 toolchain and the default release mode
23+ flags (` /GR /EHsc /MD /O2 /Ob2 ` ). The benchmark was run on a system with an i7-14700K
24+ (supporting [ AVX2] ( src/arch/x86/AVX2.cpp ) ) and 64GB (4x16GB) DDR5 6000 MT/s (30-38-38-96).
2325The full source code is available [ here] ( test/benchmark/Compare.cpp ) .
2426```
25- ---------------------------------------------------------------------------------------
26- Benchmark Time CPU Iterations bytes_per_second
27- ---------------------------------------------------------------------------------------
28- BM_Throughput_Libhat/4MiB 131578 ns 48967 ns 21379 29.6876Gi/s
29- BM_Throughput_Libhat/16MiB 813977 ns 413524 ns 3514 19.1959Gi/s
30- BM_Throughput_Libhat/128MiB 6910936 ns 3993486 ns 403 18.0873Gi/s
31- BM_Throughput_Libhat/256MiB 13959379 ns 8121906 ns 202 17.9091Gi/s
32-
33- BM_Throughput_UC1/4MiB 4739731 ns 2776015 ns 591 843.93Mi/s
34- BM_Throughput_UC1/16MiB 19011485 ns 10841837 ns 147 841.597Mi/s
35- BM_Throughput_UC1/128MiB 152277511 ns 82465278 ns 18 840.571Mi/s
36- BM_Throughput_UC1/256MiB 304964544 ns 180555556 ns 9 839.442Mi/s
37-
38- BM_Throughput_UC2/4MiB 9633499 ns 4617698 ns 291 415.218Mi/s
39- BM_Throughput_UC2/16MiB 38507193 ns 22474315 ns 73 415.507Mi/s
40- BM_Throughput_UC2/128MiB 307989100 ns 164930556 ns 9 415.599Mi/s
41- BM_Throughput_UC2/256MiB 616449240 ns 331250000 ns 5 415.282Mi/s
27+ ---------------------------------------------------------------------------------------------------
28+ Benchmark Time CPU Iterations bytes_per_second
29+ ---------------------------------------------------------------------------------------------------
30+ BM_Throughput_libhat/4MiB 67686 ns 67816 ns 82254 57.7110Gi/s
31+ BM_Throughput_libhat/16MiB 319801 ns 319558 ns 18287 48.8585Gi/s
32+ BM_Throughput_libhat/128MiB 5325733 ns 5282315 ns 1056 23.4709Gi/s
33+ BM_Throughput_libhat/256MiB 10921878 ns 10814951 ns 510 22.8898Gi/s
34+
35+ BM_Throughput_std_search/4MiB 1364050 ns 1361672 ns 4108 2.86372Gi/s
36+ BM_Throughput_std_search/16MiB 5470025 ns 5458783 ns 1019 2.85648Gi/s
37+ BM_Throughput_std_search/128MiB 43622456 ns 43483527 ns 129 2.86550Gi/s
38+ BM_Throughput_std_search/256MiB 88093320 ns 87158203 ns 64 2.83790Gi/s
39+
40+ BM_Throughput_std_find_std_equal/4MiB 178567 ns 178586 ns 31410 21.8755Gi/s
41+ BM_Throughput_std_find_std_equal/16MiB 806394 ns 805228 ns 7005 19.3764Gi/s
42+ BM_Throughput_std_find_std_equal/128MiB 8944718 ns 8953652 ns 623 13.9747Gi/s
43+ BM_Throughput_std_find_std_equal/256MiB 18092713 ns 18102751 ns 309 13.8177Gi/s
44+
45+ BM_Throughput_UC1/4MiB 1727027 ns 1721236 ns 3268 2.26183Gi/s
46+ BM_Throughput_UC1/16MiB 6878188 ns 6849054 ns 819 2.27167Gi/s
47+ BM_Throughput_UC1/128MiB 55181849 ns 55300245 ns 102 2.26524Gi/s
48+ BM_Throughput_UC1/256MiB 110209374 ns 110000000 ns 50 2.26841Gi/s
49+
50+ BM_Throughput_UC2/4MiB 4011942 ns 4001524 ns 1394 997.023Mi/s
51+ BM_Throughput_UC2/16MiB 16136510 ns 16166908 ns 346 991.540Mi/s
52+ BM_Throughput_UC2/128MiB 130954740 ns 130087209 ns 43 977.437Mi/s
53+ BM_Throughput_UC2/256MiB 261157833 ns 261160714 ns 21 980.250Mi/s
4254```
4355
4456## Platforms
@@ -60,16 +72,53 @@ Below is a summary of the support of libhat OS APIs on various platforms:
6072| ` hp::module::for_each_segment ` | ✅ | ✅ | |
6173
6274## Quick start
63- ### Pattern scanning
75+ ### Defining patterns
76+ libhat's signature syntax consists of space-delimited tokens and is backwards compatible with IDA syntax:
77+
78+ - 8 character sequences are interpreted as binary
79+ - 2 character sequences are interpreted as hex
80+ - 1 character must be a wildcard (` ? ` )
81+
82+ Any digit can be substituted for a wildcard, for example:
83+ - ` ????1111 ` is a binary sequence, and matches any byte with all ones in the lower nibble
84+ - ` A? ` is a hex sequence, and matches any byte of the form ` 1010???? `
85+ - Both ` ???????? ` and ` ?? ` are equivalent to ` ? ` , and will match any byte
86+
87+ A complete pattern might look like ` AB ? 12 ?3 ` . This matches any 4-byte
88+ subrange ` s ` for which all the following conditions are met:
89+ - ` s[0] == 0xAB `
90+ - ` s[2] == 0x12 `
91+ - ` s[3] & 0x0F == 0x03 `
92+
93+ Due to how various scanning algorithms are implemented, there are some restrictions when defining a pattern:
94+
95+ 1 ) A pattern must contain at least one fully masked byte (i.e. ` AB ` or ` 10011001 ` )
96+ 2 ) The first byte with a non-zero mask must have a full mask
97+ - ` ?1 02 ` is disallowed
98+ - ` 01 02 ` is allowed
99+ - ` ?? 01 ` is allowed
100+
101+ In code, there are a few ways to initialize a signature from its string representation:
102+
64103``` cpp
65104#include < libhat/scanner.hpp>
66105
67106// Parse a pattern's string representation to an array of bytes at compile time
68107constexpr hat::fixed_signature pattern = hat::compile_signature<" 48 8D 05 ? ? ? ? E8" >();
69108
70- // ...or parse it at runtime
109+ // Parse using the UDLs at compile time
110+ using namespace hat ::literals;
111+ constexpr hat::fixed_signature pattern = "48 8D 05 ? ? ? ? E8"_ sig; // stack owned
112+ constexpr hat::signature_view pattern = "48 8D 05 ? ? ? ? E8"_ sigv; // static lifetime (requires C++23)
113+
114+ // Parse it at runtime
71115using parsed_t = hat::result<hat::signature, hat::signature_parse_error>;
72116parsed_t runtime_pattern = hat::parse_signature("48 8D 05 ? ? ? ? E8");
117+ ```
118+
119+ ### Scanning patterns
120+ ```cpp
121+ #include <libhat/scanner.hpp>
73122
74123// Scan for this pattern using your CPU's vectorization features
75124auto begin = /* a contiguous iterator over std::byte */;
@@ -97,6 +146,21 @@ const std::byte* address = result.get();
97146const std::byte* relative_address = result.rel(3);
98147```
99148
149+ libhat has a few optimizations for searching for patterns in ` x86_64 ` machine code:
150+ ``` cpp
151+ #include < libhat/scanner.hpp>
152+
153+ // If a byte pattern matches at the start of a function, the result will be aligned on 16-bytes.
154+ // This can be indicated via the defaulted `alignment` parameter (all overloads have this parameter):
155+ std::span<std::byte> range = /* ... */ ;
156+ hat::signature_view pattern = /* ... */ ;
157+ hat::scan_result result = hat::find_pattern(range, pattern, hat::scan_alignment::X16);
158+
159+ // Additionally, x86_64 contains a non-uniform distribution of byte pairs. By passing the `x86_64`
160+ // scan hint, the search can be based on the least common byte pair that is found in the pattern.
161+ hat::scan_result result = hat::find_pattern(range, pattern, hat::scan_alignment::X1, hat::scan_hint::x86_64);
162+ ```
163+
100164### Accessing offsets
101165``` cpp
102166#include < libhat/access.hpp>
0 commit comments