fix(dab): mongo crash recovery + vendor common_scaffold for per-query validators#9
Merged
Conversation
… validators Two independent DAB trial-stability bugfixes: 1. compose.py — mongo:8 intermittently SIGSEGVs (exit 139) on startup, and its WiredTiger cache auto-sizes to ~half of host RAM, starving the agent. Cap the cache at 1GB and set restart: on-failure so a crashed mongod comes back up against the already-populated data dir and the healthcheck recovers. Without this, a single crash bricks the whole trial. 2. prepare.py — 9 of 54 DAB validators do `from common_scaffold.validate .levenshtein import levenshtein`. verify.py exec_module's validate.py inside the dab-agent container, which has no common_scaffold installed, so the import raised, no reward.json was written, and harbor reported RewardFileNotFoundError. Vendor common_scaffold next to verify.py on the per-query path (the batch path already did this) so the import resolves. Both paths covered by new regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Improves DAB trial stability by hardening MongoDB service behavior in generated docker-compose output and ensuring common_scaffold is vendored for per-query validator execution inside the dab-agent container.
Changes:
- Add
restart: on-failureand cap WiredTiger cache size fordab-mongoin generated compose output. - Vendor
common_scaffoldinto per-query task/testsso upstream validators importing it can run successfully. - Add regression tests covering both the compose Mongo settings and the per-query
common_scaffoldmaterialization.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/razorback-plugin-dab/tests/unit/test_prepare_per_query.py | Adds regression test ensuring per-query tasks vendor common_scaffold and validators can import it. |
| packages/razorback-plugin-dab/tests/unit/test_compose_mongo.py | Adds regression test asserting Mongo restart policy and WiredTiger cache cap are emitted. |
| packages/razorback-plugin-dab/src/razorback_plugin_dab/generate/prepare.py | Ensures per-query task materialization also installs common_scaffold into /tests. |
| packages/razorback-plugin-dab/src/razorback_plugin_dab/generate/compose.py | Updates generated dab-mongo service to restart on failure and cap WiredTiger cache size. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two independent DAB trial-stability bugfixes, each with a regression test.
1. Mongo crash recovery (
compose.py)mongo:8intermittently SIGSEGVs (exit 139) on startup, and its WiredTiger cache auto-sizes to ~half of host RAM (≈7GB on a 15GB box), starving the agent. We now:command: ["--wiredTigerCacheSizeGB", "1"]restart: on-failureBecause the data dir is already populated, a restarted
mongodcomes straight back up with data and the healthcheck recovers. Without this, a single crash bricks the whole trial —main's healthcheck fails for its entire retry window and the trial is cancelled.2. Vendor
common_scaffoldon the per-query path (prepare.py)9 of 54 DAB validators do
from common_scaffold.validate.levenshtein import levenshtein.verify.pyexec_module'svalidate.pyinside thedab-agentcontainer, which has nocommon_scaffoldinstalled, so the import raised, noreward.jsonwas written, and harbor reportedRewardFileNotFoundError(the verifier appeared never to run).The batch path already vendored
common_scaffoldnext toverify.py; the per-query path now does too (/testsissys.path[0]), so the import resolves.Tests
test_mongo_has_restart_and_cache_captest_per_query_materializes_common_scaffold_for_upstream_validatorsAll 14 tests in the two affected files pass.
🤖 Generated with Claude Code