Skip to content

feat(experimental): small model mode for local inference#884

Draft
ericksoa wants to merge 2 commits intomainfrom
fix/small-model-mode
Draft

feat(experimental): small model mode for local inference#884
ericksoa wants to merge 2 commits intomainfrom
fix/small-model-mode

Conversation

@ericksoa
Copy link
Contributor

Summary

  • Adds small model mode for local inference providers (Ollama, vLLM) that reduces system prompt overhead so small local models have more context capacity for conversation
  • Explicit ollama-local/vllm-local cases in getSandboxInferenceConfig() (was relying on default fallthrough)
  • New NEMOCLAW_SMALL_MODEL_MODE build arg lowers bootstrap token budgets (bootstrapMaxChars=4000, bootstrapTotalMaxChars=8000) and writes compact workspace files (SOUL.md, AGENTS.md) at image build time
  • Logs [experimental] during onboarding when active

Context

NVBUG 6018719: Ollama with qwen2.5:0.5b produces usable inference but garbage answers because OpenClaw's ~14KB+ default system prompt overwhelms the model's capacity. A/B testing showed the compact prompt saves ~1700 prompt tokens per turn — the difference between ~18 and ~30+ conversation turns before context exhaustion.

This is experimental — needs design review before graduating from draft.

Test plan

  • Unit tests pass (479/479, 1 pre-existing timeout on main)
  • A/B tested qwen2.5:0.5b locally: compact prompt produces equivalent quality answers with 97% fewer prompt tokens
  • End-to-end: build sandbox with Ollama + small model, verify workspace files are written
  • End-to-end: verify cloud providers (NVIDIA, OpenAI, Anthropic) are NOT affected
  • Design review on compact workspace file content

When onboarding with a local inference provider (Ollama, vLLM), enable
small model mode which reduces system prompt overhead so small models
have more context capacity for actual conversation.

Changes:
- Add explicit ollama-local/vllm-local cases to getSandboxInferenceConfig
- New NEMOCLAW_SMALL_MODEL_MODE build arg sets bootstrapMaxChars=4000
  and bootstrapTotalMaxChars=8000 in openclaw.json
- Write compact SOUL.md and AGENTS.md workspace files at build time
- Log "[experimental]" during onboarding when small model mode is active

Ref: NVBUG 6018719
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f9e045b4-bc9c-42f9-ab26-c674838b2330

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/small-model-mode

Comment @coderabbitai help to get the list of available commands and usage tips.

Docker's parser interprets heredoc end markers as the end of the RUN
instruction, causing lines after the first heredoc to be parsed as
unknown Dockerfile instructions. Switch to printf with escaped newlines.
@ericksoa ericksoa self-assigned this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant