Skip to content

Add opt-in Ractor-safe InstanceRegistry via RICE_RACTOR_SAFE#401

Open
javier-sy wants to merge 4 commits intoruby-rice:masterfrom
javier-sy:ractor-safe-registry
Open

Add opt-in Ractor-safe InstanceRegistry via RICE_RACTOR_SAFE#401
javier-sy wants to merge 4 commits intoruby-rice:masterfrom
javier-sy:ractor-safe-registry

Conversation

@javier-sy
Copy link
Copy Markdown

Hi there,

I'd like to propose an opt-in mechanism for Ractor safety in Rice. I understand this touches a sensitive area — the docs explicitly state "Rice provides no mechanisms for dealing with thread safety" — so I want to explain the context that led here, and why I believe this is a minimal, safe change worth considering.

Motivation

I'm the author of MusaDSL, an algorithmic music composition framework for Ruby. I'm currently building a new module — a real-time counterpoint engine that generates multi-voice polyphony via constraint satisfaction. The engine uses or-tools-ruby (Google's OR-Tools CP-SAT solver) for finding valid voice-leading solutions, and or-tools-ruby is built on Rice. For real-time performance, the solver needs to run inside a Ractor so it doesn't block the sequencer thread — which is how I encountered the InstanceRegistry thread-safety issue.

I got it working by marking the extension as Ractor-safe (rb_ext_ractor_safe(true)), but concurrent access from multiple Ractors caused segfaults.

After investigation, I traced the issue to a single point: Rice's InstanceRegistry — the std::map that tracks C++ object wrappers. It's the only Rice registry that performs writes at runtime (add, lookup, remove). The other registries (TypeRegistry, NativeRegistry, HandlerRegistry, ModuleRegistry) are all written once during Init_* and are read-only afterward.

The change

When RICE_RACTOR_SAFE is defined at compile time, a std::recursive_mutex is added to InstanceRegistry, protecting all four mutable operations. Two files changed, 21 lines added:

  • rice/detail/InstanceRegistry.hpp — conditional #include <mutex> + mutex member
  • rice/detail/InstanceRegistry.ippstd::lock_guard in lookup, add, remove, clear

Without the define, the generated code is identical to the current Rice — no mutex, no overhead, no behavior change. Existing projects are completely unaffected.

Consumers opt in via their extconf.rb:

$CXXFLAGS << " -DRICE_RACTOR_SAFE"

Tests

The PR includes two Minitest test suites in test/ruby/ (matching the existing Rake::TestTask glob):

Positive tests (test_ractor.rb) — a minimal Rice extension compiled with -DRICE_RACTOR_SAFE:

  • Single Ractor, sequential Ractors, concurrent Ractors
  • Many objects in a Ractor (50 objects stressing the registry)
  • Main thread + Ractor simultaneously
  • Calibrated heavy concurrency: constructor duration calibrated to ~1s per object, 2 Ractors × 3 objects overlapping
  • Calibrated fast concurrency: number of Counter.new calls calibrated to fill ~1s, 2 Ractors creating that many objects simultaneously

Negative test (test_ractor_unsafe.rb) — same extension compiled WITHOUT the define:

  • Runs in a subprocess (so a crash doesn't kill the test runner)
  • Concurrent Ractors with calibrated object count → hang or segfault confirmed
  • 30s timeout — corruption of the unprotected std::map manifests as an infinite loop in the red-black tree, not always as a segfault

Both calibration strategies ensure the tests produce meaningful contention on any machine, fast or slow.

Why not a mutex by default?

I considered making the mutex unconditional, but:

  • It would change Rice's performance characteristics for all users
  • The overhead is small but nonzero (mutex acquire/release on every object wrap/unwrap)
  • Most Ruby C extensions don't use Ractors today

The opt-in approach lets Ractor-aware projects enable it explicitly while keeping the default behavior unchanged.

Compatibility

Tested with Ruby 3.4.x (where Ractors are experimental). Ruby 4.x compatibility is pending validation — the Ractor API may change, but the mutex mechanism itself is pure C++ and should remain valid.

Note on the #include

I'm aware of the project policy about not adding #include directives to arbitrary files. The #include <mutex> is inside InstanceRegistry.hpp itself (not in rice.hpp), and is conditional on #ifdef RICE_RACTOR_SAFE. I believe this is the least invasive placement, but I'm happy to move it if you prefer a different approach.


I realize this is a change in Rice's stance on thread safety, and I completely understand if you'd prefer to approach it differently. I'm open to any feedback — whether that's changes to the implementation, a different API surface, or even just keeping this in a fork until Ractors stabilize. The important thing for me was to identify the exact problem (InstanceRegistry is the only runtime-mutable registry) and demonstrate that a minimal fix works.

Thank you for maintaining Rice — it's a remarkable piece of engineering that makes the Ruby/C++ boundary feel natural.

InstanceRegistry is the only Rice registry that performs writes at
runtime (lookup, add, remove, clear). The other registries
(TypeRegistry, NativeRegistry, HandlerRegistry, ModuleRegistry) are
written once during Init and are read-only at runtime.

When RICE_RACTOR_SAFE is defined, a std::recursive_mutex protects
all mutable InstanceRegistry operations. This allows C extensions
built with Rice to be used safely from multiple Ruby Ractors.

Without the define, the generated code is identical to the current
Rice — zero impact on existing projects.

Usage in extconf.rb:
  $CXXFLAGS << " -DRICE_RACTOR_SAFE"
A minimal Rice extension (Counter class) compiled with
-DRICE_RACTOR_SAFE, plus 5 Minitest tests verifying:

- Single Ractor creating and using Rice-wrapped objects
- Sequential Ractors with rapid handoff
- Concurrent Ractors (would segfault without the mutex)
- Many objects in a Ractor (stresses InstanceRegistry add/lookup)
- Main thread and Ractor using Rice simultaneously

The tests auto-build the extension on first run.
Located in test/ruby/ matching the existing Rake::TestTask glob.
A second test extension compiled WITHOUT -DRICE_RACTOR_SAFE proves
that concurrent Ractor access to Rice-wrapped objects corrupts the
unprotected InstanceRegistry (std::map).

The corruption manifests as a hang (infinite loop in the corrupted
red-black tree), not always as a segfault. The test runs in a
subprocess with a 30s timeout — hang or crash both confirm the bug.

This is the counterpart to test_ractor.rb which proves the same
workload succeeds WITH RICE_RACTOR_SAFE.
- docs/why_rice.md: expand Thread Safety section with Ractor support
  subsection. Clarifies that without the define behavior is unchanged,
  documents the extconf.rb usage, and notes Ruby 3.4.x compatibility
  (Ruby 4.x pending validation).
- CHANGELOG.md: add unreleased entry for the feature.

The #include <mutex> is conditional (#ifdef RICE_RACTOR_SAFE) inside
InstanceRegistry.hpp itself — no changes to rice.hpp, respecting
the project's include management policy.
@javier-sy
Copy link
Copy Markdown
Author

Just for more information: if the PR is accepted my next step is to ask for a PR I have already prepared and tested locally for or-tools-ruby using the new version of rice. If it's not accepted I can make a custom code adaptor for my counterpoint engine gem so it's not a stop for my project but I think the solution could be helpful for more users than me. Thank you again!

@cfis
Copy link
Copy Markdown
Collaborator

cfis commented Mar 28, 2026

Hi @javier-sy - thanks for the very thoughtful PR. I didn't actually realize Rice had that language about threading in the why rice document, that is ancient. I think I'll just remove it. However, Rice today is likely not thread safe. But it should be!

The simplest solution is you can just turn off the instance registry - it is a global setting.

If we want to use it, I think the patch would stop crashes but doesn't fully fix the problem. If two Ractors wrap the same C++ pointer concurrently they will both get Qnil from lookup, both allocate a Wrapper, and both add new instances. The second would silently overwrites the first, leaking a Wrapper and leaving a dangling Ruby VALUE. We would need to add a lookup_or_insert method or some such? Or allow the mutex to be held across lookup and insert, but that would reuqire exposing it to calling code.

I also wonder if the complexity of RICE_RACTOR_SAFE is worth the bother. I don't know how much overhead the mutex will cause, but I'd lean towards we have them or we don't, but its not a configurable setting.

Anyway, if you are good with just turning off the registry problem solved. If you want it enabled, then I think we will need to think more about the best way to protect IntanceRegistry (and probably other part of the code base also).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants