SIMD types
cpc ships fixed-width SIMD as primitive types. The widths cover the 128-bit and 256-bit families that map directly to NEON, SSE, AVX2, and AVX:
- 128-bit floats:
f32x4,f64x2. 256-bit floats:f32x8,f64x4. - 128-bit ints (signed and unsigned):
i8x16,i16x8,i32x4,i64x2, and theusiblings. 256-bit ints:i8x32...i64x4andusiblings. - 64-bit (sub-128) widths:
i8x8,f32x2, and the rest of the NEON D-register family, mainly produced by.low()/.high()and consumed by.widen()/.combine(). - Mask types:
mask8x16,mask32x4, and so on, distinct from integer SIMD (see below).
512-bit widths are deferred until those targets are tier-1.
Constructors
let v: f32x4 = f32x4::splat(1.0f32); // broadcast
let w: f32x4 = f32x4::new(1.0f32, 2.0f32, 3.0f32, 4.0f32); // per-lane
let v2: f32x4 = unsafe { f32x4::load(p as *f32) }; // unsafe, lane-aligned
unsafe { v.store(p as *f32); }
let arr: [f32; 4] = v.to_array(); // FFI escape
let v3: f32x4 = f32x4::from_array(arr);
Methods, by element type
- Arithmetic (all numeric widths):
.add(b),.sub(b),.mul(b),.div(b). - Float-only:
.fma(b, c),.sqrt(),.abs(). - Signed-int-only:
.abs()(rejected on unsigned with E0324). - All numeric:
.min(b),.max(b). - Integer-only:
.and(b),.or(b),.xor(b),.not(),.shl(count),.shr(count)(count is a literalu32).
Lane-type conversion and reinterpret
let i: i32x4 = i32x4::new(1, 2, 3, 4);
let f: f32x4 = f32x4::from_int(i); // int -> float, lane-wise
let j: i32x4 = i32x4::from_float(f); // float -> int, truncates toward zero
let bytes: u8x16 = u8x16::splat(255u8);
let signed: i8x16 = i8x16::reinterpret(bytes); // same bits, different lane type
from_int / from_float require the same lane count and width (i32x4 to/from f32x4). reinterpret is a bit-preserving cast requiring the same total width; lane count and type may differ. A mismatch is E0324.
Widening pipelines
These instance methods move between a full vector and its 64-bit halves, and between adjacent integer lane widths, the building blocks of integer widening (NEON vget_low / vcombine / vmovl / vmovn):
let v: i8x16 = i8x16::splat(3i8);
let lo: i8x8 = v.low(); // bottom 8 lanes
let hi: i8x8 = v.high(); // top 8 lanes
let back: i8x16 = lo.combine(hi); // join two halves
let wide: i16x8 = lo.widen(); // each lane to the next int size up
let narrow: i8x8 = wide.narrow(); // each lane to the next int size down
.widen() sign-extends signed lanes and zero-extends unsigned ones; .narrow() truncates. Float or 64-bit lanes have nothing wider/narrower (E0324). Together they make a widening integer dot product expressible without a dedicated builtin.
Lane access, shuffles, reductions
let v: f32x4 = f32x4::new(1.0f32, 2.0f32, 3.0f32, 4.0f32);
let x: f32 = v.lane(0 as u32); // 1.0; literal index only
let v2: f32x4 = v.with_lane(3 as u32, 9.0f32); // (1, 2, 3, 9)
let r: f32x4 = v.reverse(); // (4, 3, 2, 1)
let s: f32 = v.sum(); // 10.0
let p: f32x4 = v.swizzle([3 as u32, 2 as u32, 1 as u32, 0 as u32]); // literal indices
The lane index must be a literal u32 in 0..N (E0873 if not literal, E0874 if out of range). A horizontal sum() / product() returns the lane type, so on narrow integer lanes it can wrap; the compiler emits the non-fatal W0001 warning at that site. Fix it by widening first, or use simd/integer::dot_i32. For a runtime index vector, use .table(idx) on a 16-byte vector (NEON vqtbl1q).
Masks and select
Compare-and-blend is the branchless idiom:
let mask: mask32x4 = a.lt(b); // comparison yields a mask
let result: f32x4 = mask.select(a, b); // pick from a where true, else b
if mask.any() { /* at least one lane true */ }
if mask.all() { /* every lane true */ }
Mask types are distinct from integer SIMD. A comparison returns mask{N}x{M}, not i{N}x{M}. .select / .any / .all require a mask receiver (E0324 otherwise), arithmetic on masks is rejected, and there is no implicit mask-to-SIMD coercion (E0302). Masks are produced by comparisons, never lane-by-lane. Cross between the two explicitly with .to_bits() and .to_mask(), which are no-ops at the LLVM level.
The FFI boundary
SIMD types have no portable C-ABI representation, so they do not cross an extern fn boundary by default (E0410). Use the array round-trip:
// ✅ FFI-safe shape
pub extern fn process(v: [f32; 4]) -> [f32; 4] {
let s: f32x4 = f32x4::from_array(v);
return s.mul(f32x4::splat(2.0f32)).to_array();
}
For higher-level 3D math and prebuilt integer lane kernels, see the simd package.