The autotune benchmark writes a per-GPU profile to bin/cuda_tune_sm<SM>_<N>SMs.cfg and reuses it on every subsequent launch. The problem is that the profile is tied to the kernel as it existed when the benchmark ran — if the kernel changes significantly between releases (e.g. the v0.4.0 shared-memory S-box rewrite), the cached block count and chunk size can be wrong or suboptimal for the new code, and users who upgrade won't notice because the stale file is silently reused.
The fix is to add a version field to the first line of the .cfg file. On load, if the stored version doesn't match the current build's constant, delete the file and re-run autotune automatically. A simple integer tied to DMRCRACK_VERSION_MINOR (or a separate CUDA_PROFILE_VERSION constant bumped manually on kernel ABI changes) is enough.
Where to look: src/bruteforce.cu — the load_cuda_profile and save_cuda_profile functions. Add a version=N line at the top of the file on save, check it on load before reading any other field.
The autotune benchmark writes a per-GPU profile to
bin/cuda_tune_sm<SM>_<N>SMs.cfgand reuses it on every subsequent launch. The problem is that the profile is tied to the kernel as it existed when the benchmark ran — if the kernel changes significantly between releases (e.g. the v0.4.0 shared-memory S-box rewrite), the cached block count and chunk size can be wrong or suboptimal for the new code, and users who upgrade won't notice because the stale file is silently reused.The fix is to add a version field to the first line of the
.cfgfile. On load, if the stored version doesn't match the current build's constant, delete the file and re-run autotune automatically. A simple integer tied toDMRCRACK_VERSION_MINOR(or a separateCUDA_PROFILE_VERSIONconstant bumped manually on kernel ABI changes) is enough.Where to look:
src/bruteforce.cu— theload_cuda_profileandsave_cuda_profilefunctions. Add aversion=Nline at the top of the file on save, check it on load before reading any other field.