-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Description
What happened:
Running nemoclaw onboard on Fedora hangs indefinitely after the sandbox image is successfully pushed to the gateway. The CLI prints Image is available in the gateway but never proceeds to steps 6/7, never prints a success message, and never writes the sandbox entry to ~/.nemoclaw/sandboxes.json. The sandbox itself is fully healthy (confirmed via kubectl get pods showing 1/1 Running), but the NemoClaw CLI wrapper is unaware of it.
A secondary issue: Docker on Fedora defaults to the overlayfs storage driver instead of overlay2, which causes slow and eventually stalled image pushes during sandbox creation.
What was expected:
Onboarding should complete all 7 steps, print a success message, and register the sandbox in ~/.nemoclaw/sandboxes.json so that nemoclaw list and nemoclaw connect work correctly.
Reproduction Steps
- Install NemoClaw on Fedora via curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash
- Run nemoclaw onboard, select NVIDIA Endpoints, select Nemotron 3 Super 120B, name the sandbox
- Observe that after Image is available in the gateway is printed, the CLI hangs indefinitely (verified for 1.5+ hours)
- In a second terminal, run docker exec openshell-cluster-nemoclaw kubectl get pods --all-namespaces — sandbox shows 1/1 Running
- Run cat ~/.nemoclaw/sandboxes.json — shows {"sandboxes": {}, "defaultSandbox": null}
- Run nemoclaw list — shows No sandboxes registered
Workaround:
- Switch Docker storage driver from overlayfs to overlay2 by editing /etc/docker/daemon.json
- Manually write the sandbox entry to ~/.nemoclaw/sandboxes.json with the correct name, gateway, provider, model, and policies
Environment
- OS: Fedora Linux (kernel confirmed running, SELinux active)
- NemoClaw version: v0.1.0
- OpenShell CLI version: openshell 0.0.10
- Docker version: Docker version 29.3.0, build 5927d80
- Node.js version: v22.22.1
- GPU: NVIDIA GPU, 8151 MB VRAM
- RAM: 30.73 GB
- Docker storage driver (default): overlayfs — had to manually switch to overlay2
Debug Output
"nemoclaw debug --output nemoclaw-debug.tar.gz" Has been attached in the description.
---
### Logs
Key finding from `docker logs openshell-cluster-nemoclaw` while hung — only etcd compaction loop entries, no sandbox startup activity:
time="..." level=info msg="COMPACT compactRev=276 targetCompactRev=367 currentRev=1276"
time="..." level=info msg="COMPACT deleted 18 rows from 91 revisions in 1.298802ms"
Key finding from `docker exec openshell-cluster-nemoclaw kubectl describe pod fedora-claw -n openshell` while CLI was hung:
Status: Running
Ready: True
Events: Container started successfullyLogs
Checklist
- I confirmed this bug is reproducible
- I searched existing issues and this is not a duplicate