Skip to content

feat(core): worker resource monitor#1378

Open
leshy wants to merge 23 commits intodevfrom
feat/resource-monitor
Open

feat(core): worker resource monitor#1378
leshy wants to merge 23 commits intodevfrom
feat/resource-monitor

Conversation

@leshy
Copy link
Contributor

@leshy leshy commented Feb 28, 2026

Problem

Obsoleting dask, we didn't know the resource use of modules

Misc

  • tagged expensive voxel mapper test as tool (was used in dev)
  • removed unitree_webrtc/init.py shim that was loading pygame on every depickle

Solution

  • Adds psutil-based resource monitoring to ModuleCoordinator.loop(), collecting system stats, pluggable publishers (structlog, lcm)
  • Stats published over pickle LCM to /dimos/resource_stats
  • New dtop CLI tool capturing the LCM topic
  • Bump smart blueprint to 7 workers

TODO

this (with diff stats output - maybe on exit one) can now be used for profiling tests, and comparing to dev!

Breaking Changes

None

How to Test

uv sync --all-extras
dtop

run some blueprint

dimos --dtop run ...

Contributor License Agreement

  • I have read and approved the CLA.
2026-03-01_14-50

67861b2bccd80-3398979908

Adds psutil-based resource monitoring to ModuleCoordinator.loop(),
collecting CPU, memory (PSS/USS/RSS/VMS), threads, children, FDs,
and IO stats per worker process every 2s. Stats are published over
LCM and viewable with the new `dtop` CLI tool.

- WorkerStats dataclass and collect_stats() on Worker/WorkerManager
- ResourceLogger protocol with LCM and structlog implementations
- dtop: live Textual TUI subscribing to /dimos/resource_stats
- psutil added as explicit dependency
- Bump smart blueprint to 7 workers
Cache psutil.Process objects across snapshots so cpu_percent(interval=None)
has a previous sample to diff against. Fix wrong module name in docstring,
remove dead _snap_count state, and extend color gradient to cyan→yellow→red.
@leshy leshy changed the title feat(core): per-worker resource monitor + dtop TUI feat(core): worker resource monitor Feb 28, 2026
@leshy leshy marked this pull request as ready for review March 1, 2026 06:54
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

resource_logger: ResourceLogger | None = None,
monitor_interval: float = 1.0,
) -> None:
_logger: ResourceLogger = resource_logger or LCMResourceLogger()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a flag to only gather statistics if it was requested.

@dimensionalOS dimensionalOS deleted a comment from greptile-apps bot Mar 1, 2026
@leshy
Copy link
Contributor Author

leshy commented Mar 1, 2026

I think class StatsMonitor can potentially be refactored, (too coupled to workers vs processes, could have per process monitor class for better caching) but now this feature is isolated and refactoring is easy once we know more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants