SIMD Within A Register

How can we use SIMD instructions to operate on integer data that appears in sizes the instructions don't understand. There are potential applications for this outside of just Parabix. SWAR could enable using the smallest possible bit-width to encode data with much less hassle. E.g. DNA base pairs can be encoded as a 2 bit number.

Considerations must be made for arithmetic operations because the size of their output can differ from that of their input. In the case of addition for example, care must be taken since overflows will corrupt other data in the register. Similar considerations must be made for other operations.

In its current state, LLVM treats unsupported vector widths atrociously.

Introductory Example

Let's examine the simplest problem: bit vector addition.

What we have:

define <128 x i1> @addi1 (<128 x i1> %v1, <128 x i1> %v2) {
entry:
    %out = add <128 x i1> %v1, %v2
    ret <128 x i1> %out
}

What we get:

53KB of ASM

What we want:

define <128 x i1> @addi1 (<128 x i1> %v1, <128 x i1> %v2) {
entry:
    %v3 = reinterpret_cast <128 x i1> %v1 to <16 x i8>
    %v4 = reinterpret_cast <128 x i1> %v2 to <16 x i8>
    %v5 = xor <16 x i8> %v3, <16 x i8> %v4
    %out = reinterpret_cast <16 x i8> %v5 to <128 x i1>
    ret <128 x i1> %out
}

End goal

Achieve similar results for all common operations (arithmetic and logic) on duos and nybbles.