# SIMD types

`cpc` ships fixed-width SIMD as primitive types. The widths cover the 128-bit and 256-bit families that map directly to NEON, SSE, AVX2, and AVX:

- **128-bit floats**: `f32x4`, `f64x2`. **256-bit floats**: `f32x8`, `f64x4`.
- **128-bit ints** (signed and unsigned): `i8x16`, `i16x8`, `i32x4`, `i64x2`, and the `u` siblings. **256-bit ints**: `i8x32` ... `i64x4` and `u` siblings.
- **64-bit (sub-128) widths**: `i8x8`, `f32x2`, and the rest of the NEON D-register family, mainly produced by `.low()` / `.high()` and consumed by `.widen()` / `.combine()`.
- **Mask types**: `mask8x16`, `mask32x4`, and so on, distinct from integer SIMD (see below).

512-bit widths are deferred until those targets are tier-1.

## Constructors

```cplus
let v: f32x4 = f32x4::splat(1.0f32);                       // broadcast
let w: f32x4 = f32x4::new(1.0f32, 2.0f32, 3.0f32, 4.0f32); // per-lane

let v2: f32x4 = unsafe { f32x4::load(p as *f32) };         // unsafe, lane-aligned
unsafe { v.store(p as *f32); }

let arr: [f32; 4] = v.to_array();                          // FFI escape
let v3: f32x4     = f32x4::from_array(arr);
```

## Methods, by element type

- **Arithmetic** (all numeric widths): `.add(b)`, `.sub(b)`, `.mul(b)`, `.div(b)`.
- **Float-only**: `.fma(b, c)`, `.sqrt()`, `.abs()`.
- **Signed-int-only**: `.abs()` (rejected on unsigned with **E0324**).
- **All numeric**: `.min(b)`, `.max(b)`.
- **Integer-only**: `.and(b)`, `.or(b)`, `.xor(b)`, `.not()`, `.shl(count)`, `.shr(count)` (count is a literal `u32`).

## Lane-type conversion and reinterpret

```cplus
let i: i32x4 = i32x4::new(1, 2, 3, 4);
let f: f32x4 = f32x4::from_int(i);          // int -> float, lane-wise
let j: i32x4 = i32x4::from_float(f);        // float -> int, truncates toward zero

let bytes: u8x16  = u8x16::splat(255u8);
let signed: i8x16 = i8x16::reinterpret(bytes);   // same bits, different lane type
```

`from_int` / `from_float` require the same lane count and width (`i32x4` to/from `f32x4`). `reinterpret` is a bit-preserving cast requiring the same total width; lane count and type may differ. A mismatch is **E0324**.

## Widening pipelines

These instance methods move between a full vector and its 64-bit halves, and between adjacent integer lane widths, the building blocks of integer widening (NEON `vget_low` / `vcombine` / `vmovl` / `vmovn`):

```cplus
let v: i8x16 = i8x16::splat(3i8);
let lo: i8x8 = v.low();              // bottom 8 lanes
let hi: i8x8 = v.high();             // top 8 lanes
let back: i8x16 = lo.combine(hi);    // join two halves

let wide:   i16x8 = lo.widen();      // each lane to the next int size up
let narrow: i8x8  = wide.narrow();   // each lane to the next int size down
```

`.widen()` sign-extends signed lanes and zero-extends unsigned ones; `.narrow()` truncates. Float or 64-bit lanes have nothing wider/narrower (**E0324**). Together they make a widening integer dot product expressible without a dedicated builtin.

## Lane access, shuffles, reductions

```cplus
let v: f32x4 = f32x4::new(1.0f32, 2.0f32, 3.0f32, 4.0f32);
let x: f32 = v.lane(0 as u32);                       // 1.0; literal index only
let v2: f32x4 = v.with_lane(3 as u32, 9.0f32);       // (1, 2, 3, 9)

let r: f32x4 = v.reverse();                          // (4, 3, 2, 1)
let s: f32   = v.sum();                              // 10.0
let p: f32x4 = v.swizzle([3 as u32, 2 as u32, 1 as u32, 0 as u32]);  // literal indices
```

The lane index must be a **literal** `u32` in `0..N` (**E0873** if not literal, **E0874** if out of range). A horizontal `sum()` / `product()` returns the **lane** type, so on narrow integer lanes it can wrap; the compiler emits the non-fatal **W0001** warning at that site. Fix it by widening first, or use [`simd/integer::dot_i32`](/docs/packages/simd). For a runtime index vector, use `.table(idx)` on a 16-byte vector (NEON `vqtbl1q`).

## Masks and select

Compare-and-blend is the branchless idiom:

```cplus
let mask: mask32x4 = a.lt(b);                        // comparison yields a mask
let result: f32x4  = mask.select(a, b);              // pick from a where true, else b

if mask.any() { /* at least one lane true */ }
if mask.all() { /* every lane true */ }
```

**Mask types are distinct from integer SIMD.** A comparison returns `mask{N}x{M}`, not `i{N}x{M}`. `.select` / `.any` / `.all` require a mask receiver (**E0324** otherwise), arithmetic on masks is rejected, and there is no implicit mask-to-SIMD coercion (**E0302**). Masks are produced by comparisons, never lane-by-lane. Cross between the two explicitly with `.to_bits()` and `.to_mask()`, which are no-ops at the LLVM level.

## The FFI boundary

SIMD types have no portable C-ABI representation, so they do **not** cross an `extern fn` boundary by default (**E0410**). Use the array round-trip:

```cplus
// ✅ FFI-safe shape
pub extern fn process(v: [f32; 4]) -> [f32; 4] {
    let s: f32x4 = f32x4::from_array(v);
    return s.mul(f32x4::splat(2.0f32)).to_array();
}
```

For higher-level 3D math and prebuilt integer lane kernels, see the [simd package](/docs/packages/simd).
