Perf: invalidate autotune cache when the kernel version changes

The autotune benchmark writes a per-GPU profile to `bin/cuda_tune_sm<SM>_<N>SMs.cfg` and reuses it on every subsequent launch. The problem is that the profile is tied to the kernel as it existed when the benchmark ran — if the kernel changes significantly between releases (e.g. the v0.4.0 shared-memory S-box rewrite), the cached block count and chunk size can be wrong or suboptimal for the new code, and users who upgrade won't notice because the stale file is silently reused.

The fix is to add a version field to the first line of the `.cfg` file. On load, if the stored version doesn't match the current build's constant, delete the file and re-run autotune automatically. A simple integer tied to `DMRCRACK_VERSION_MINOR` (or a separate `CUDA_PROFILE_VERSION` constant bumped manually on kernel ABI changes) is enough.

**Where to look:** `src/bruteforce.cu` — the `load_cuda_profile` and `save_cuda_profile` functions. Add a `version=N` line at the top of the file on save, check it on load before reading any other field.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: invalidate autotune cache when the kernel version changes #33

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Perf: invalidate autotune cache when the kernel version changes #33

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions