C+
Systems · v0.0.13

SIMD types

cpc ships fixed-width SIMD as primitive types. The widths cover the 128-bit and 256-bit families that map directly to NEON, SSE, AVX2, and AVX:

  • 128-bit floats: f32x4, f64x2. 256-bit floats: f32x8, f64x4.
  • 128-bit ints (signed and unsigned): i8x16, i16x8, i32x4, i64x2, and the u siblings. 256-bit ints: i8x32 ... i64x4 and u siblings.
  • 64-bit (sub-128) widths: i8x8, f32x2, and the rest of the NEON D-register family, mainly produced by .low() / .high() and consumed by .widen() / .combine().
  • Mask types: mask8x16, mask32x4, and so on, distinct from integer SIMD (see below).

512-bit widths are deferred until those targets are tier-1.

Constructors

let v: f32x4 = f32x4::splat(1.0f32);                       // broadcast
let w: f32x4 = f32x4::new(1.0f32, 2.0f32, 3.0f32, 4.0f32); // per-lane

let v2: f32x4 = unsafe { f32x4::load(p as *f32) };         // unsafe, lane-aligned
unsafe { v.store(p as *f32); }

let arr: [f32; 4] = v.to_array();                          // FFI escape
let v3: f32x4     = f32x4::from_array(arr);

Methods, by element type

  • Arithmetic (all numeric widths): .add(b), .sub(b), .mul(b), .div(b).
  • Float-only: .fma(b, c), .sqrt(), .abs().
  • Signed-int-only: .abs() (rejected on unsigned with E0324).
  • All numeric: .min(b), .max(b).
  • Integer-only: .and(b), .or(b), .xor(b), .not(), .shl(count), .shr(count) (count is a literal u32).

Lane-type conversion and reinterpret

let i: i32x4 = i32x4::new(1, 2, 3, 4);
let f: f32x4 = f32x4::from_int(i);          // int -> float, lane-wise
let j: i32x4 = i32x4::from_float(f);        // float -> int, truncates toward zero

let bytes: u8x16  = u8x16::splat(255u8);
let signed: i8x16 = i8x16::reinterpret(bytes);   // same bits, different lane type

from_int / from_float require the same lane count and width (i32x4 to/from f32x4). reinterpret is a bit-preserving cast requiring the same total width; lane count and type may differ. A mismatch is E0324.

Widening pipelines

These instance methods move between a full vector and its 64-bit halves, and between adjacent integer lane widths, the building blocks of integer widening (NEON vget_low / vcombine / vmovl / vmovn):

let v: i8x16 = i8x16::splat(3i8);
let lo: i8x8 = v.low();              // bottom 8 lanes
let hi: i8x8 = v.high();             // top 8 lanes
let back: i8x16 = lo.combine(hi);    // join two halves

let wide:   i16x8 = lo.widen();      // each lane to the next int size up
let narrow: i8x8  = wide.narrow();   // each lane to the next int size down

.widen() sign-extends signed lanes and zero-extends unsigned ones; .narrow() truncates. Float or 64-bit lanes have nothing wider/narrower (E0324). Together they make a widening integer dot product expressible without a dedicated builtin.

Lane access, shuffles, reductions

let v: f32x4 = f32x4::new(1.0f32, 2.0f32, 3.0f32, 4.0f32);
let x: f32 = v.lane(0 as u32);                       // 1.0; literal index only
let v2: f32x4 = v.with_lane(3 as u32, 9.0f32);       // (1, 2, 3, 9)

let r: f32x4 = v.reverse();                          // (4, 3, 2, 1)
let s: f32   = v.sum();                              // 10.0
let p: f32x4 = v.swizzle([3 as u32, 2 as u32, 1 as u32, 0 as u32]);  // literal indices

The lane index must be a literal u32 in 0..N (E0873 if not literal, E0874 if out of range). A horizontal sum() / product() returns the lane type, so on narrow integer lanes it can wrap; the compiler emits the non-fatal W0001 warning at that site. Fix it by widening first, or use simd/integer::dot_i32. For a runtime index vector, use .table(idx) on a 16-byte vector (NEON vqtbl1q).

Masks and select

Compare-and-blend is the branchless idiom:

let mask: mask32x4 = a.lt(b);                        // comparison yields a mask
let result: f32x4  = mask.select(a, b);              // pick from a where true, else b

if mask.any() { /* at least one lane true */ }
if mask.all() { /* every lane true */ }

Mask types are distinct from integer SIMD. A comparison returns mask{N}x{M}, not i{N}x{M}. .select / .any / .all require a mask receiver (E0324 otherwise), arithmetic on masks is rejected, and there is no implicit mask-to-SIMD coercion (E0302). Masks are produced by comparisons, never lane-by-lane. Cross between the two explicitly with .to_bits() and .to_mask(), which are no-ops at the LLVM level.

The FFI boundary

SIMD types have no portable C-ABI representation, so they do not cross an extern fn boundary by default (E0410). Use the array round-trip:

// ✅ FFI-safe shape
pub extern fn process(v: [f32; 4]) -> [f32; 4] {
    let s: f32x4 = f32x4::from_array(v);
    return s.mul(f32x4::splat(2.0f32)).to_array();
}

For higher-level 3D math and prebuilt integer lane kernels, see the simd package.