Skip to content

migrate to Chase-Lev deque#235

Merged
tzcnt merged 3 commits into
v2-devfrom
chase_lev_deque
Jun 3, 2026
Merged

migrate to Chase-Lev deque#235
tzcnt merged 3 commits into
v2-devfrom
chase_lev_deque

Conversation

@tzcnt

@tzcnt tzcnt commented Jun 2, 2026

Copy link
Copy Markdown
Owner

There are 2 versions of the implementation, chosen based on the target platform.

  • 32 bit: the standard Chase-Lev implementation
  • 64 bit: packs both indexes into a single word so that we can accurately count the number of elements available in the queue with 1 operation. try_pop() has a fast path that completes in 1 operation when we can confirm there was more than 1 element in the queue.

Removed the old ExplicitProducer-based work stealing setup which was providing the equivalent functionality. The moodycamel queue contains only the ImplicitProducer-related code, and is still used to accept work from external threads.


Breaking changes: TMC_WORK_ITEM=FUNC is no longer supported, since the Chase-Lev queue requires the work item type to be trivially copyable and trivially destructible. Users of this configuration should migrate to TMC_WORK_ITEM=FUNCORO instead.

@tzcnt tzcnt force-pushed the chase_lev_deque branch from ea0e561 to df225ba Compare June 2, 2026 15:52
@solbjorn

solbjorn commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Tested this PR on my engine:

  • Windows 11 x86_64 26220.8544
  • Clang/LLVM 22.1.7
  • libc++ and llvm-libc 22.1.7 (unstable ABI v2, hardening, hardened iterators)
  • -O3 -flto -fwhole-program-vtables -std=c++26 (+ no MSVC compat etc.)
  • TMC configuration:
#define TMC_NODISCARD_AWAIT
#define TMC_PRIORITY_COUNT 3
#define TMC_STANDALONE_COMPILATION
#define TMC_TRIVIAL_TASK
#define TMC_USE_HWLOC // hwloc master branch
// #define TMC_WORK_ITEM=CORO -- default
  • tmc::cpu_executor() + standalone tmc::ex_cpu_st

No compilation issues or new warnings, no runtime issues, no perf regressions.
exe size decreased by 12-15 Kb.

The diffstat is also nice.
Rocking as always \m/

@tzcnt tzcnt force-pushed the chase_lev_deque branch from 4e29a51 to 21ec906 Compare June 3, 2026 03:28
@tzcnt tzcnt changed the base branch from main to v2-dev June 3, 2026 04:00
@tzcnt

tzcnt commented Jun 3, 2026

Copy link
Copy Markdown
Owner Author

Thanks. Performance in all benchmarks has been equivalent to the old queue, but it should result in less cache miss / TLB pressure overall, and the code is shorter / easier to understand.

I'm going to delay merging this to main because it's technically a breaking change (requires the removal of TMC_WORK_ITEM=FUNC), and I'm considering releasing it along with some other breaking changes.

@tzcnt tzcnt merged commit bf098c4 into v2-dev Jun 3, 2026
46 checks passed
@tzcnt tzcnt deleted the chase_lev_deque branch June 3, 2026 04:07
tzcnt added a commit that referenced this pull request Jun 6, 2026
@tzcnt

tzcnt commented Jun 6, 2026

Copy link
Copy Markdown
Owner Author

I changed my mind; I think this can go with v1.6:

  • I doubt anyone's using TMC_WORK_ITEM=FUNC since the performance was poor anyway
  • It's a relatively small breaking change that is easy to fix (swap to using TMC_WORK_ITEM=FUNCORO instead)
  • It can be detected at compile time with a static_assert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants