CI: switch ThreadSanitizer to a gcc-based, fully instrumented build#554
Conversation
|
Warning Review limit reached
More reviews will be available in 8 minutes and 47 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR switches FairMQ's ThreadSanitizer CI job from clang to gcc using fully instrumented dependencies, eliminates the suppressions file, and adds libstdc++ locale cache warmup to prevent false-positive races from lazy initialization. The infrastructure includes a new spack tsan environment with a custom tsan-instrumented libstdc++ package. ChangesThreadSanitizer Fully-Instrumented Build and Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Note This review was generated with Claude. Verdict: approve with minor comments. The PR is correct in everything it actively does — no critical or major findings survived verification. All confirmed findings are robustness gaps at the edges (silent-failure modes, a coverage boundary) plus one stale comment. What was verified correct (highlights)
FindingsShould fix before merge
Worth considering
Take or leave
One claimed finding ("LSAN + FAIRMQ_TEST_LD_LIBRARY_PATH combined breaks test discovery") was refuted during verification — the two-entry |
- concurrent execute() calls print captured subprocess lines to std::cout from multiple threads; the standard allows that, but libstdc++ maintains the formatted-output state (ios_base::width) with plain reads and writes -- a data race ThreadSanitizer reports once libstdc++ itself is instrumented - a mutex around the insertion also keeps whole lines from interleaving
- the subscriber threads captured the loop counter by reference while the spawning loop kept incrementing it: a genuine data race - depending on timing, threads could also end up with duplicate subscriber names; capture the counter by value instead
- the pattern is constant; compiling it on every Validate() call is wasted work and, when channels are validated from multiple threads, needlessly exercises libstdc++'s lazily-populated ctype caches
- std::ctype<char> caches narrow()/widen() results per character in plain char arrays of the global classic-locale facet, written without synchronization from header-inlined code (locale_facets.h); two threads exercising an uncached character concurrently (e.g. compiling a std::regex in Channel::Validate) constitute a true data race that ThreadSanitizer rightfully reports - the stores are real and unsynchronized, so a tsan-instrumented libstdc++ cannot help here; instead fill the caches before any thread is spawned, which turns every later access into a pure read - warm the lazily-installed num_put/num_get caches used by stream insertion/extraction as well, via a small format/parse round-trip - wire the warm-up into the gtest runner main() and, via a static initializer, into the test device runner
- CMAKE_EXE_LINKER_FLAGS and CMAKE_SHARED_LINKER_FLAGS are command-line strings; list(APPEND) inserts a semicolon once the variable is non-empty, which splits the link command at the shell level - latent until now because nothing passed linker flags on the cmake command line
- introduce FAIRMQ_TEST_LD_LIBRARY_PATH, which prepends a directory to each test's environment via ctest, so the tests can run against an alternative runtime library (e.g. a tsan-instrumented libstdc++) - LD_LIBRARY_PATH rather than an injected rpath: an rpath added via the linker flags cannot precede the rpath spack's gcc adds through its specs file, so the compiler's own libstdc++ would keep winning the runtime search order - scoped per test on purpose: an instrumented library has unresolved __tsan_* symbols and must not be loaded into uninstrumented tools like cmake, ctest or ninja - fail the configuration instead of silently dropping the injection on CMake < 3.22 (ENVIRONMENT_MODIFICATION) - cover the example tests too; they share the instrumented runtime but not the locale-cache warmup (their main() is the installed public header). The custom-controller env block was dead before: it tested lsan_options, which only ever existed in the add_example() function scope, so the test also never received the LSan suppressions
- buildcache push expands its selection into the full dependency closure, build-time dependencies included; specs that were satisfied from the buildcache do not have those installed locally, and the push fails with PackageNotInstalledError - both push sites (the early gcc node push and the env-level push) only ever ran in fresh-build scenarios before, so the failure surfaced once the cache was warm - pass --allow-missing to skip what is not installed (a best-effort push of everything that is); a freshly built gcc thus still uploads its build-time dependencies, which a future gcc rebuild can then pull as binaries
- gcc ships no supported switch to build libstdc++ with -fsanitize=thread, and spack's gcc recipe filters all flags out of the target-library build (CXXFLAGS_FOR_TARGET is owned by its generated --with-build-config=spack makefile), so provide a dedicated libstdcxx-tsan package in a custom repo - build only the libstdc++-v3 subtree from the matching gcc release tarball, configured standalone against the already-installed toolchain (recipe modeled on https://iree.dev/developers/debugging/sanitizers/), instead of rebuilding all of gcc - the result is a drop-in runtime replacement for the compiler's libstdc++ (same soname and symbol versions), to be loaded only by the instrumented test executables - normalize the install layout after make install: the standalone build puts the runtime libraries into the multilib os dir (lib64 on x86_64) regardless of --libdir, and --with-toolexeclibdir only applies to cross builds - register the repo in the setup-deps action before creating the env
- mirror spack-latest.yaml, with -fsanitize=thread on the libzmq and libsodium nodes so tsan can observe the happens-before edges established inside libzmq's lock-free queues, plus the libstdcxx-tsan root spec - flags are applied per node instead of via the propagating '==' operator, which could reach the gcc node and trigger a compiler rebuild - unchanged roots (fairlogger, boost, ninja, cmake) keep their spec hashes, so they are shared with the regular buildcache entries; the instrumented nodes hash differently and coexist in the content-addressed cache - exclude libstdcxx-tsan from concretizer reuse so recipe changes always take effect; unchanged recipes still hit the buildcache because the spec hash is identical - add the tsan env to the buildcache matrix (rebuilding also on spack_repo changes) so the instrumented binaries are cached instead of rebuilt on every CI run
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Line 103: Replace tag-only GitHub Action refs with full commit SHAs: find each
occurrence of "hendrikmuhs/ccache-action@v1" and "threeal/cmake-action@v2" in
the workflow and pin them to the corresponding commit SHA (e.g.,
"hendrikmuhs/ccache-action@<full-sha>" and "threeal/cmake-action@<full-sha>");
update all instances (the three "hendrikmuhs/ccache-action@v1" entries and the
two "threeal/cmake-action@v2" entries) so the workflow uses the exact commit
SHAs instead of tag-only refs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 125301c7-b51b-433c-bce3-26c57eae3050
📒 Files selected for processing (18)
.github/actions/setup-deps/action.yml.github/workflows/buildcache.yml.github/workflows/ci.ymlCMakeLists.txtcmake/FairMQProjectSettings.cmakeexamples/CMakeLists.txtexamples/custom-controller/CMakeLists.txtfairmq/Channel.cxxfairmq/tools/Process.cxxtest/CMakeLists.txttest/ci/spack-tsan.yamltest/ci/spack_repo/fairmq_ci/packages/libstdcxx_tsan/package.pytest/ci/spack_repo/fairmq_ci/repo.yamltest/helper/LocaleWarmup.htest/helper/runTestDevice.cxxtest/plugin_services/_control.cxxtest/runner.cxx.intest/thread_sanitizer_suppressions.txt
💤 Files with no reviewable changes (1)
- test/thread_sanitizer_suppressions.txt
✅ Files skipped from review due to trivial changes (2)
- cmake/FairMQProjectSettings.cmake
- test/ci/spack_repo/fairmq_ci/repo.yaml
🚧 Files skipped from review as they are similar to previous changes (9)
- test/helper/runTestDevice.cxx
- test/CMakeLists.txt
- .github/actions/setup-deps/action.yml
- test/plugin_services/_control.cxx
- fairmq/tools/Process.cxx
- test/helper/LocaleWarmup.h
- fairmq/Channel.cxx
- test/runner.cxx.in
- test/ci/spack-tsan.yaml
- build with the spack gcc toolchain like every other job: no clang++, no -fuse-ld=lld and no lld install step (the GNU BFD failure on tsan objects was specific to clang's tsan runtime) - use gcc 15 for the freshest libtsan runtime; the asan entry stays on gcc 14, so the matrix now carries per-entry gcc and spack env names - consume the tsan spack env and load the instrumented libstdc++ into each test's environment via FAIRMQ_TEST_LD_LIBRARY_PATH (it shares the soname of the compiler's own, so it substitutes process-wide at load time) - use -fno-omit-frame-pointer for readable reports; optimization comes from the project's Debug -Og - verify the wiring: assert the test environment resolves libstdc++ to the instrumented copy and that libzmq is tsan-instrumented, since both failure modes are silent (the suite still passes, with reduced race coverage)
- every entry stood in for a library tsan could not see into; with libzmq, libsodium and libstdc++ now tsan-instrumented in the tsan CI job, the happens-before edges they establish are visible and nothing is left to suppress - suppressions were blunt (a race: entry matches any frame in the stack), so they could also mask real races passing through those frames
- ENABLE_SANITIZER_UNDEFINED_BEHAVIOUR never matched the CMake option ENABLE_SANITIZER_UNDEFINED_BEHAVIOR (cmake/FairMQProjectSettings.cmake), so the asan+lsan+ubsan job has never actually enabled UBSan
- hendrikmuhs/ccache-action: the v1 major tag also lagged behind the latest release (v1.2.20 instead of v1.2.23), so this is a bump too - threeal/cmake-action: v2 == v2.1.0, pinned as-is
Closes #553.
The tsan job now compiles and links with gcc 15 (
-fsanitize=thread) against a fullytsan-instrumented dependency tree, and the suppressions file is gone. The full suite (188 tests)
runs green with zero suppressions.
Infrastructure
libstdcxx-tsanspack package (test/ci/spack_repo): gcc offers no supported switch tobuild libstdc++ with tsan and spack's gcc recipe owns the target-library flags outright, so the
package builds only the libstdc++-v3 subtree from the matching gcc release tarball, configured
standalone against the installed toolchain (~3 min build, cached in the buildcache).
compiler's own and is selected via
LD_LIBRARY_PATHper test (newFAIRMQ_TEST_LD_LIBRARY_PATHCMake variable) — per test on purpose, since the library carries unresolved
__tsan_*symbolsand must not be loaded into uninstrumented tools (cmake, ctest, ninja). An rpath cannot work
here: spack-gcc's specs file injects its own rpath ahead of any user link flag.
test/ci/spack-tsan.yaml: mirrorsspack-latest.yamlwith-fsanitize=threadon thelibzmq and libsodium nodes (per-node flags, no
==propagation, so nothing can leak onto thegcc node). Unchanged roots keep their spec hashes and stay shared with the regular buildcache;
libstdcxx-tsanis excluded from concretizer reuse so recipe changes always take effect.ci.yml: tsan matrix entry dropsclang++,-fuse-ld=lldand the lld install step;per-entry gcc/env keys (asan unchanged on gcc 14); a small permanent step asserts the
instrumentation wiring, since its failure modes are silent.
PackageNotInstalledErrorwhenever specs are satisfied from the cache (--allow-missingonboth the gcc and the env push; a freshly built gcc still uploads its build-time deps for
future rebuilds) — pre-existing, surfaced by the warm cache.
Findings (previously masked)
Removing the suppressions surfaced and fixed real issues:
fair::mq::tools::executeprinted tostd::coutfrom concurrent threads; libstdc++ keeps theformatted-output state (
ios_base::width) in plain reads/writes, which only an instrumentedlibstdc++ can see — both accesses were invisible before.
SubscriptionThreadSafetytest captured its loop counter by reference while the spawning loopkept incrementing it.
list(APPEND CMAKE_EXE_LINKER_FLAGS ...)in the project settings corrupted the link commandwith a semicolon as soon as anything passed linker flags externally.
std::ctype<char>lazily cachesnarrow()/widen()per character withunsynchronized stores from header-inlined code — a true race under the memory model that no
instrumentation can absolve; the test binaries now pre-fill the caches before threads exist
(
test/helper/LocaleWarmup.h), andChannel::Validatecompiles its constant regex only once.Definition of done from #553
-fuse-ld=lld, no lld install stepreports implicate them; instrumentable the same way if that changes)
TSAN_OPTIONSwiring removed