Skip to content

Perf: invalidate autotune cache when the kernel version changes #33

@felixx-sp

Description

@felixx-sp

The autotune benchmark writes a per-GPU profile to bin/cuda_tune_sm<SM>_<N>SMs.cfg and reuses it on every subsequent launch. The problem is that the profile is tied to the kernel as it existed when the benchmark ran — if the kernel changes significantly between releases (e.g. the v0.4.0 shared-memory S-box rewrite), the cached block count and chunk size can be wrong or suboptimal for the new code, and users who upgrade won't notice because the stale file is silently reused.

The fix is to add a version field to the first line of the .cfg file. On load, if the stored version doesn't match the current build's constant, delete the file and re-run autotune automatically. A simple integer tied to DMRCRACK_VERSION_MINOR (or a separate CUDA_PROFILE_VERSION constant bumped manually on kernel ABI changes) is enough.

Where to look: src/bruteforce.cu — the load_cuda_profile and save_cuda_profile functions. Add a version=N line at the top of the file on save, check it on load before reading any other field.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions