Confidence: medium · Effort: trivial (<1 h)
Problem
The kernel signatures correctly mark output device pointers with __restrict__ (e.g. dev_keys_tested, dev_best_packed at bruteforce.cu:675, 826, 1009). But the __device__ scoring helpers take const unsigned char * input parameters without __restrict__, even though those buffers (cipher_packs, key5, payload33) are independent and never alias.
Files:
src/bruteforce.cu:485-489 (score_burst_correct_dev)
src/bruteforce.cu:555-559 (extract_voice_dibits_dev)
src/bruteforce.cu:589-593 (score_dmr_ambe_dev)
Suggested fix
Add __restrict__ to the const unsigned char * parameters. The compiler may emit better LD.E.CG / cache hints. Safe change with measurable upside on Turing and later.
Why it matters
Tiny, low-risk performance win and a perfect first-PR task for someone learning CUDA. Verify before/after with nvcc --ptxas-options=-v or cuobjdump.
Confidence: medium · Effort: trivial (<1 h)
Problem
The kernel signatures correctly mark output device pointers with
__restrict__(e.g.dev_keys_tested,dev_best_packedatbruteforce.cu:675, 826, 1009). But the__device__scoring helpers takeconst unsigned char *input parameters without__restrict__, even though those buffers (cipher_packs, key5, payload33) are independent and never alias.Files:
src/bruteforce.cu:485-489(score_burst_correct_dev)src/bruteforce.cu:555-559(extract_voice_dibits_dev)src/bruteforce.cu:589-593(score_dmr_ambe_dev)Suggested fix
Add
__restrict__to theconst unsigned char *parameters. The compiler may emit better LD.E.CG / cache hints. Safe change with measurable upside on Turing and later.Why it matters
Tiny, low-risk performance win and a perfect first-PR task for someone learning CUDA. Verify before/after with
nvcc --ptxas-options=-vorcuobjdump.