Skip to content

Masked lanes SIMD operations #66

@yurydelendik

Description

@yurydelendik

(per conversion from 2025-07-19 SIMD meeting filing it here)

Currently LLVM toolchain generates non-efficiently-translatable-to-native shuffle operations. The shuffle operation is doing right thing, but it is really hard to produce an efficient translation to native instructions. For example (1):

    i8x16.shuffle 0 0 0 0 1 0 0 0 2 0 0 0 3 0 0 0
    v128.const i32x4 0x00000001 0x00000001 0x00000001 0x00000001
    v128.and

The shuffle operation will be encoded ad multiple instructions including loading additional constants. But it is possible to encode it more efficiently if it is known that some lanes will be discarded in the final value, e.g. we can place any lane in 'x' places 0 x x x 1 x x x 2 x x x 3 x x x and now it is possible to use couple of zero extend instructions instead. I guess the code is automatically generated by a auto-vectorizer which chose 0 as arbitrary lane.

More interesting operations:

    i8x16.shuffle 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    v128.store16_lane align=1 0
    i8x16.shuffle 12 13 14 15 0 0 0 0 0 0 0 0 0 0 0 0
    i16x8.extend_low_i8x16_u
    i32x4.extend_low_i16x8_u

or

    i8x16.shuffle 8 9 10 11 12 13 14 15 0 1 0 1 0 1 0 1
    i32x4.extend_low_i16x8_u
    local.get 3
    local.get 3
    local.get 3
    i8x16.shuffle 8 9 10 11 12 13 14 15 0 0 0 0 0 0 0 0
    i8x16.min_u
    local.tee 3
    local.get 3
    local.get 3
    i8x16.shuffle 4 5 6 7 0 0 0 0 0 0 0 0 0 0 0 0
    i8x16.min_u
    local.tee 3
    local.get 3
    local.get 3
    i8x16.shuffle 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    i8x16.min_u
    local.tee 3
    local.get 3
    local.get 3
    i8x16.shuffle 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    i8x16.min_u
    i8x16.extract_lane_u 0

The common thing that if it is known that some zero-lane references are not important, it is easier to select a more performant instructions for i8x16.shuffle compiled code. If this burden falls on a toolchain/auto-vectorizer, the selected shuffle may "prefer" one CPU.

The snippets are taken from https://cdn.jsdelivr.net/npm/[email protected]/dist/ort-wasm-simd.jsep.wasm

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions