Dynamic threadgroup memory by christiangnrd · Pull Request #750 · JuliaGPU/Metal.jl

christiangnrd · 2026-03-01T20:17:15Z

Surprisingly it's somewhat functional. Only macOS 15+ since the global dynamic threadgroup memory is not available before then. This isn't the only way to get this dynamic threadgroup memory, but I tried this approach as my first attempt since it seemed the most similar to how static threadgroup memory is implemented.

The Metal interface takes an Integer or a Tuple of the size of the allocation, which is then aligned to the next multiple of 16.

Kernels silently fail under shader validation (which are caught in the tests since the output doesn't match expected results.

Close #701

codecov · 2026-03-01T20:32:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.35%. Comparing base (356e7d2) to head (a76f582).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
- Coverage   81.74%   81.35%   -0.40%     
==========================================
  Files          66       63       -3     
  Lines        3145     3138       -7     
==========================================
- Hits         2571     2553      -18     
- Misses        574      585      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

[only tests] [only special]

christiangnrd mentioned this pull request Mar 1, 2026

[metal] Dynamic Threadgroup memory JuliaGPU/GPUCompiler.jl#768

Draft

christiangnrd added help wanted Extra attention is needed kernels Things about kernels and how they are compiled. labels Mar 14, 2026

christiangnrd force-pushed the dynmem branch from 8f00fe0 to e038d2a Compare April 11, 2026 20:19

christiangnrd force-pushed the dynmem branch 5 times, most recently from f5549c3 to c702088 Compare June 2, 2026 01:14

christiangnrd added 10 commits June 2, 2026 11:21

set_threadgroup_memory_length!

075da88

[temp] air 2.7

3c9dbdb

Fixup tests

877ac31

[to clean up] "working" AI-assisted dynamic shared memory

d65296d

SHMEM interface

7bf7373

Cleanup

1131d0b

Proof of concept mapreduce

0574d6c

Align dynamic threadgroup memory to 16 bytes

bb05fed

Tests

d7ac61f

Gpucompiler

a76f582

[only tests] [only special]

christiangnrd force-pushed the dynmem branch from c702088 to a76f582 Compare June 2, 2026 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic threadgroup memory#750

Dynamic threadgroup memory#750
christiangnrd wants to merge 10 commits into
mainfrom
dynmem

christiangnrd commented Mar 1, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christiangnrd commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christiangnrd commented Mar 1, 2026 •

edited

Loading

codecov Bot commented Mar 1, 2026 •

edited

Loading