Skip to content

Remove stale KVM children and keep IOLoop responsive during connect#38

Open
nesvet wants to merge 1 commit into
sciapp:developfrom
nesvet:feat/stale-child-cleanup
Open

Remove stale KVM children and keep IOLoop responsive during connect#38
nesvet wants to merge 1 commit into
sciapp:developfrom
nesvet:feat/stale-child-cleanup

Conversation

@nesvet

@nesvet nesvet commented Jun 28, 2026

Copy link
Copy Markdown

Summary

When the server stops without a clean WebSocket close, ephemeral nojava-ipmi-kvmrc-* children can keep noVNC ports busy. The next docker run exits with return code 125 (port is already allocated).

This PR unifies stale-child cleanup with non-blocking Docker I/O on the connect path.

Design

Piece Role
stale_children.py Find/remove nojava-ipmi-kvmrc-* that are exited/dead or hold a host port in [WEB_PORT_START, WEB_PORT_END)
start_kvm_container() cleanup_stale_kvm_children() via _run_blocking before docker run
_run_blocking run_in_executor wrapper for subprocess.call / Popen on the connect path (check_docker, cleanup, container launch, docker port poll)
log_factory loop.call_soon_threadsafe when forwarding additional_logging from executor threads
docker_terminated_message() When return code is 125, hint that a stale child may be holding the port

Test plan

  • Start session, kill server without closing browser tab, reconnect — no manual docker rm
  • Container on a port outside [WEB_PORT_START, WEB_PORT_END) is not removed
  • macOS / Linux Docker availability check still works
  • KVM log lines still reach the browser during connect

Tested on ASUS ASMB8-iKVM firmware 1.14.2. After server restart with an orphaned child, cleanup runs on the next connect and the session starts without exit 125.

Requires sciapp/nojava-ipmi-kvm-server#9 for server startup / SIGTERM / atexit cleanup.

Related to sciapp/nojava-ipmi-kvm-server#7

Add cleanup_stale_kvm_children() and call it from start_kvm_container.
Run blocking docker subprocess work in a thread pool so Tornado's IOLoop
stays responsive during connect. Schedule additional_logging on the IOLoop
from worker threads. Improve exit 125 error message when the noVNC port
is already in use.
@nesvet nesvet force-pushed the feat/stale-child-cleanup branch from 431b878 to ec3d945 Compare June 28, 2026 21:47
@nesvet nesvet changed the title Remove stale nojava-ipmi-kvmrc containers on port conflict and server stop Remove stale KVM children and keep IOLoop responsive during connect Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant