Optimize `Fw::StringUtils::string_length` with SWAR algorithm by vsoulgard · Pull Request #4789 · nasa/fprime

vsoulgard · 2026-03-02T13:17:26Z


*Related Issue(s)*	#4788
*Has Unit Tests (y/n)*	y
*Documentation Included (y/n)*	n
*Generative AI was used in this contribution (y/n)*	y

Change Description

This PR implements the performance optimization proposed in #4788. It replaces the naive byte-by-byte string length calculation with a SWAR (SIMD Within A Register) approach using word-sized and bitwise NUL char detection.

Details:

Processes the initial unaligned head byte-by-byte to ensure word-alignment before the main loop.
Uses standard bitwise operations to detect NUL char in parallel within a register.
For short strings (word-sized or smaller) a simple byte-by-byte loop is used to avoid overhead.
Added macro NO_ASAN to suppress false positives from AddressSanitizer for trusted functions.
Added new edge test cases and alignment assert.

Rationale

SWAR reduces the number of loop iterations and memory load instructions, thus improving performance. This technique is widely used in high-performance standard libraries.

Benchmarks:

(Tested on Linux x86_64 with GCC 14.2.0)

Benchmark Results (aligned)

New version is slightly slower for very short strings, but a lot faster for long ones:
N=2 - 1.28 vs. 2.01 (ns)
N=4 - 1.76 vs. 2.27 (ns)
N=8 - 3.06 vs. 3.01 (ns)
N=16 - 6.05 vs. 3.43 (ns)
N=32 - 19.30 vs. 4.31 (ns)
N=64 - 27.97 vs. 6.37 (ns)
N=128 - 53.45 vs. 10.36 (ns) (x5 faster)
N=256 - 100.27 vs. 22.29 (ns)
N=512 - 191.22 vs. 42.47 (ns)

Benchmark Results (unaligned)

Unaligned calculations are slower than aligned ones, because of head byte-by-byte cycle, but still faster than old version:
N=2 - 1.33 vs. 1.91 (ns)
N=4 - 2.03 vs. 2.26 (ns)
N=8 - 3.55 vs. 3.02 (ns)
N=16 - 6.65 vs. 5.18 (ns)
N=32 - 17.12 vs. 6.10 (ns)
N=64 - 30.34 vs. 8.44 (ns)
N=128 - 56.18 vs. 12.60 (ns)
N=256 - 103.61 vs. 24.40 (ns)
N=512 - 199.45 vs. 44.11 (ns)

Testing/Review Recommendations

Benchmark Source

static void BM_string_length(benchmark::State& state)
{
    const size_t LEN = state.range(0);
    std::string src_str = std::string(LEN, 'A');
    const char* src = src_str.c_str();

    for (auto _ : state)
    {
        auto result = Fw::StringUtils::string_length(src, LEN);
        benchmark::DoNotOptimize(result);
    }
}

static void BM_string_lengthV2(benchmark::State& state)
{
    const size_t LEN = state.range(0);
    std::string src_str = std::string(LEN, 'A');
    const char* src = src_str.c_str();

    for (auto _ : state)
    {
        auto result = Fw::StringUtils::string_lengthV2(src, LEN);
        benchmark::DoNotOptimize(result);
    }
}

BENCHMARK(BM_string_length)->RangeMultiplier(2)->Range(0, 1024);
BENCHMARK(BM_string_lengthV2)->RangeMultiplier(2)->Range(0, 1024);

BENCHMARK_MAIN();

AI Usage

AI was used for research and testing of ideas. All code implementation, algorithm design, benchmarking, and technical decisions were done by me.

vsoulgard added 2 commits March 2, 2026 15:04

Optimize Fw::StringUtils::string_length

0070054

Add unit tests for new cases

fb6d2f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `Fw::StringUtils::string_length` with SWAR algorithm#4789

Optimize `Fw::StringUtils::string_length` with SWAR algorithm#4789
vsoulgard wants to merge 2 commits intonasa:develfrom
vsoulgard:fw-types-string-length-swar

vsoulgard commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vsoulgard commented Mar 2, 2026

Change Description

Details:

Rationale

Benchmarks:

Testing/Review Recommendations

AI Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant