Proposed in #28 (originally #27). It is different from existing splat, since it broadcasts a lane from input, rather than a scalar, also takes an index to select which element to broadcast:
Gets a single lane from vector and broadcast it to the entire vector.
idx is interpreted modulo the cardinal of the vector.
vec.v8.splat_lane(v: vec.v8, idx: i32) -> vec.v8
vec.v16.splat_lane(v: vec.v16, idx: i32) -> vec.v16
vec.v32.splat_lane(v: vec.v32, idx: i32) -> vec.v32
vec.v64.splat_lane(v: vec.v64, idx: i32) -> vec.v64
vec.v128.splat_lane(v: vec.v128, idx: i32) -> vec.v128
On x86 broadcast instructions first appear in AVX (32-bit floating point elements, AVX2 for integers), however x86 variants don't take an index and only broadcasts first element of the source. General-purpose shuffle would need to be used to emulate this on SSE, which is not great (definitely slower than specialized version). Also, taking an index would lead to this turning into a general purpose shuffle on AVX+ as well.
Proposed in #28 (originally #27). It is different from existing splat, since it broadcasts a lane from input, rather than a scalar, also takes an index to select which element to broadcast:
On x86 broadcast instructions first appear in AVX (32-bit floating point elements, AVX2 for integers), however x86 variants don't take an index and only broadcasts first element of the source. General-purpose shuffle would need to be used to emulate this on SSE, which is not great (definitely slower than specialized version). Also, taking an index would lead to this turning into a general purpose shuffle on AVX+ as well.