Skip to content

Add vectorization support for offline tablegen.#73

Open
tpn wants to merge 3 commits intomainfrom
offline-vectorized
Open

Add vectorization support for offline tablegen.#73
tpn wants to merge 3 commits intomainfrom
offline-vectorized

Conversation

@tpn
Copy link
Owner

@tpn tpn commented Mar 6, 2026

No description provided.

tpn added 3 commits March 1, 2026 13:40
Create OFFLINE-VECTORIZED NOTES/LOG/TODO files under agents/ to track the
new offline vectorized-index porting work.

The notes capture scope and constraints for the first phase:
- Start with Mulshrolate1RX.
- Implement AVX2 Index32x8 and AVX-512 Index32x16.
- Preserve scalar fallback and correctness across varying TABLE_DATA widths.

The TODO defines concrete implementation, RawCString synchronization,
validation, and commit checkpoints. The log records initial repository audit
findings and the selected technical approach.
Introduce offline generated Index32x8 and Index32x16 entry points for\nMulshrolate1RX using x86 intrinsics in the non-inline C path:\n\n- Add routine naming macros and capability defines so generated test paths can\n  conditionally compile vector checks.\n- Add AVX2 x8 and AVX-512 x16 arithmetic helpers (mul/rotate/shift), with\n  scalar lane-wise table lookups retained for TABLE_DATA type safety.\n- Add scalar fallback helpers and runtime CPU gating for GCC/Clang target\n  attribute builds.\n- Keep CPH_INLINE_ROUTINES behavior intact by emitting these routines only in\n  the non-inline section.\n\nExpand generated-test coverage:\n\n- Add optional extern declarations for Index32x8/Index32x16 in\n  CompiledPerfectHashTableTest.c.\n- Validate first 8/16 keys against scalar INDEX_ROUTINE when the routines are\n  available.\n\nFixes folded into this commit:\n\n- Remove redundant 'static' from FORCEINLINE helpers to avoid duplicate\n  storage-class errors in generated GCC builds.\n- Correct IACA variant for Mulshrolate1RX by applying the missing\n  Vertex1 >>= SEED3_BYTE1 stage before table lookup.\n\nSync RawCString payload headers with template updates so codegen output uses\nthe new routines and test validation.
Update offline vectorization tracking docs after implementation and\nend-to-end validation.\n\n- LOG: append implementation details, codegen test failure/fix notes, rebuild\n  status, GCC/Clang generated-project validation, and scalar/vector benchmark\n  outputs.\n- TODO: mark completed work items for Mulshrolate1RX vector routines, test\n  integration, RawCString synchronization, and validation steps; keep remaining\n  benchmark-template work open.\n- NOTES: capture current-state summary and first benchmark observations for\n  follow-on optimization/tuning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant