Conversation
|
Hi, thank you for the PR! I haven't checked the code in detail yet, but at a glance it looks fine. I'll answer the questions from #1016 (comment) here:
There is also kind of a second level of utilization that we could somehow show visually - which CPUs are assigned to the given worker. Because right now we show all CPUs of the given PC, and the CPUs assigned to tasks, but in theory the worker might be managing only a subset of the CPUs (though that is probably relative rare? hard to tell). In fact, maybe we should just get rid of "global" CPU utilization and always just deal with the assigned resources (at least for CPUs, for memory it's more complicated). It could be useful follow-up work, to 1) only show the worker assigned CPUs here, and also to only gather utilization for those CPUs in the worker collection loop. Another follow-up could be to separate memory utilization across tasks (based on the memory utilization of their processes), but that can be very tricky to pull off, as tasks can spawn subprocesses, etc. CC @spirali about the color scheme (blue vs graying out unassigned CPUs) and if you think that we should show all CPUs on the worker, or only those managed by the worker. The latter would be more consistent with how we treat non-CPU resources. |
|
Majority of our users probably have whole node, but I am aware of some users that do subnode SLURM allocation. AFAIK they do not usually have more workers on a single node, they just have a single worker on a node that manages just subset of the node. Graying out non-managed resources seems good to me. But it should be always possible to see non-managed CPUs (or ideally all node resources). There are usually two use cases:
Let us say that will eventually have some "top"-like utility. For the first question, we want to only processes spawned by our task. For the second question we want to see all processes. So we should probably support both views. I have no strong opinion on what is a good default. |
|
Well, the thing is that knowing all resources of the node is not really something that we can robustly know. Currently, we sorta try to guess it for CPUs, but even there it might not be accurate. And for GPUs and other resources, we often might only have access to a subset of the resources. I think that from our point of view, we cannot reliably saw what is the "whole node", and should mostly talk only about resources managed by HQ workers. What we could do though is to say something like "we detected N additional CPUs on the node" or something. |
|
I wrote in "ideal case"; I know that we cannot provide "generic HW monitoring tool", but other CPUs are quite easy, so if we can provide them than I think that we should show them. |
I guess that depends on how SLURM is configured, it is definitely possible to hide some CPUs from HyperQueue. But yeah, I guess that we can show all CPUs (that are visible to us), especially since that we already do this today 😆 Just that I would treat it more as an auxiliary information rather than the main thing that we want to present. |
|
I do agree, that not showing the worker assigned cpus utilization as default is not ideal, but i wanted to keep the original intention, and this as an added feature. But as you mention above, I agree that showing it as a default makes more sense, and opting out to see "all cpus" could be a alternative. Also thinking about it now, if the user wants to see how the whole node/(all cpus in the list) are doing, they surely can do it without the fancy colors. So maybe even keep it simple without the switch might be a good option. So i will change the color scheme to classic green->red to assigned cpus, and graying out the other cpus. Set it as a default. If you come to a final decision about the possibility to switch between all cpus <> assigned cpus, i can remove it or keep it as is. I can try to brainstorm a bit to come up with new layout to represent the usage statistics. Alternatively we can discuss it when your schedule frees up. |
|
👍 on all you said. Regarding
you can keep it, but please make the assigned CPUs view be the default. |
|
I vote for keeping "assigned only" view, or at least sorting assigned cpus at the beginning. The original motivation for all this was when I need to find utilization of "my" 4 cpus on 256+ cpus machine. |
This PR adds a option to show utilization with distinguished cpus that are used by running pinned tasks that are assigned to a worker. Without the pinning the feature doesn't work well now.
Currently the change of color scheme is used to distinguish the cpus. But maybe the graying out of the other cpus might be better.
Keybindings added is 'c' that enables the user to toggle between global and worker specific cpu usage.
Features that might be connected with this:
This is part of the feature requested in #1016