feat: multi Agent OS server support in AskUiControllerClient#276
feat: multi Agent OS server support in AskUiControllerClient#276mlikasam-askui wants to merge 17 commits into
AskUiControllerClient#276Conversation
Replace single-controller config with a TargetComputer / TargetComputerManager abstraction. AskUiControllerClient now manages a list of local and remote controller servers, opens a gRPC connection per target on connect(), and routes agent-os actions to a single active target. Add tools for the agent to list, switch, and inspect the active target computer.
… target session GUID
| from .set_active_display_tool import ComputerSetActiveDisplayTool | ||
| from .switch_agent_os_server_tool import ComputerSwitchAgentOsServerTool |
There was a problem hiding this comment.
shouldn't these 3 new tools also be in experimental?
There was a problem hiding this comment.
I see them as part of the default tool list, similar to the Android agent.
| cursor_position = self.agent_os.get_mouse_position() | ||
| return f"Mouse is at position ({cursor_position.x}, {cursor_position.y})." | ||
| return ( | ||
| f"[Server with id '{server.computer_id}']: Mouse is at position " |
There was a problem hiding this comment.
I am not happy with the name "Server in general". So far, I was using the term "machine" when telling the agent that it is operating multiple devices.
The name of one machine could then be server. If we use the term server here, this will introduce ambiguity that might cause the currently working stuff to break
| return str(self.agent_os.get_system_info().model_dump_json()) | ||
| server = self.agent_os.get_active_agent_os_server(report=False) | ||
| system_info_json = self.agent_os.get_system_info().model_dump_json() | ||
| return f"[Server with id '{server.computer_id}']: {system_info_json}" |
There was a problem hiding this comment.
same here (Server ->machine)
| agent_os=agent_os, | ||
| ) | ||
|
|
||
| def __call__(self, computer_id: str) -> str: |
There was a problem hiding this comment.
how is this computer_id determined? By the user, e.g. in the init of the agent? Currently, they can set local_machine_name and remote_machine_name
There was a problem hiding this comment.
It can be set by the user, or it defaults to the session ID, which is a UUID4.
|
tbh: at the moment I really like the concept with 2 dedicated toolsets to operate 2 machines. Still, I see that this here will scale also to even more machines. Have you checked if the agent understands that it can operate multiple machines and that it calls the switch tool before executing operations if needed? Further: we need to automatically adjust the system capabilities and device information prompts if multiple agentOS are added. Also: we should definitely update the docs with this PR to explain these changes here |
| self, required_tags: list[str] | ||
| ) -> AgentOs | AndroidAgentOs: | ||
| """ | ||
| Find the first registered agent OS whose tags are a superset of |
There was a problem hiding this comment.
| Find the first registered agent OS whose tags are a superset of | |
| Find the first registered AgentOS whose tags are a superset of |
Please find all occurance. in Strings
|
|
||
| def temporary_select(self, device_sn: str) -> AbstractContextManager[Self]: |
There was a problem hiding this comment.
| def temporary_select(self, device_sn: str) -> AbstractContextManager[Self]: | |
| @abstractmethod | |
| def temporary_select(self, device_sn: str) -> AbstractContextManager[Self]: |
There was a problem hiding this comment.
Ok.
Why is this function not implmented?
| raise AndroidAgentOsError(msg) | ||
|
|
||
| @contextmanager | ||
| def temporary_select(self, device_sn: str) -> Iterator[Self]: |
There was a problem hiding this comment.
But in general. Why do we need the concept of temporary_select? How does the agent should call it?
There was a problem hiding this comment.
temporary_select is not intended to be called directly by the agent. It is designed to allow users to execute commands on a specific device without needing to restore the previous device context afterward.
It can also be used in custom tools to restrict a tool to a specific device. For example, an enableWifiOnCarEmulator tool could internally use temporary_select, ensuring it only operates on the car emulator. This differs from a generic enableWifi tool, which works across all devices, where the agent is responsible for switching to the appropriate device before invoking the tool.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def _generate_session_guid() -> str: |
There was a problem hiding this comment.
Do we have already such a function?
| return server | ||
| return None | ||
|
|
||
| def switch(self, computer_id: str) -> AgentOsServer: |
There was a problem hiding this comment.
We can switch to the same computer? Should this not failing?
| Raises: | ||
| KeyError: If no server with the given computer id is registered. | ||
| """ | ||
| server = self.get(computer_id) |
There was a problem hiding this comment.
Please wrap a proper AgentOSServerManager Error around.
| def agent_os_server_manager(self) -> AgentOsServerManager: | ||
| """The underlying Agent-OS-server manager.""" | ||
| return self._manager |
There was a problem hiding this comment.
Why do we have the server_manager inside the client?
There was a problem hiding this comment.
THis should not be here
| self.is_cacheable = True | ||
|
|
||
| def __call__(self) -> str: | ||
| server = self.agent_os.get_active_agent_os_server(report=False) |
| assert _generate_session_guid() != _generate_session_guid() | ||
|
|
||
|
|
||
| class TestReplacePort: |
c20d744 to
55cdac0
Compare
…skui-controller-multi-target
programminx-askui
left a comment
There was a problem hiding this comment.
Let's have a talk.
| OS-level operations (screenshot, mouse, keyboard, ...), and is identified by | ||
| a unique session GUID. Each server also tracks which display it is currently | ||
| operating against. | ||
| A target computer runs the server-side counterpart of the `AgentOs` client |
| "sc", | ||
| "query", | ||
| LocalAgentOsTargetComputer._ASKUI_CORE_SERVICE_NAME, |
There was a problem hiding this comment.
Which user rights do you need to perfrom the query? Did you tested it without admin rights?
| if self.is_connected: | ||
| self.connect(target) |
There was a problem hiding this comment.
Why? When you have already on connected target in the list and then you add another unconnected new target. Then it does not connect the new target.
Why do we automaticaly connect to the target? This is only an add-function. Then I would prefere a param auto_connect = True. And replace the buggy implementation, when one target is connected, that the next one get's not connected.
| def add_remote( | ||
| self, | ||
| address: str, | ||
| description: str, | ||
| ) -> RemoteAgentOsTargetComputer: | ||
| """ | ||
| Convenience method to construct and register a remote Agent OS target | ||
| computer. Auto-connects when the manager already has at least one open | ||
| connection. | ||
|
|
||
| Args: | ||
| address (str): gRPC address of the remote Agent OS target computer. | ||
| description (str): Human-readable description. | ||
|
|
||
| Returns: | ||
| RemoteAgentOsTargetComputer: The newly registered target. | ||
| """ | ||
| target = RemoteAgentOsTargetComputer(address=address, description=description) | ||
| self.add(target) | ||
| return target |
There was a problem hiding this comment.
Can we delete this function? It does not make sense.
There was a problem hiding this comment.
I still see no benefit of this funciton. -> please remove
| if self.is_connected and computer_id not in self._connections: | ||
| self.connect(target) |
There was a problem hiding this comment.
Here you trying to fix the logic from above.
There was a problem hiding this comment.
What should be the Connection Livecycle of an Target? Is the Manager Responsible to connect to the targets?
| for target in self._by_computer_id.values(): | ||
| self.connect(target) | ||
| except Exception: | ||
| self.disconnect_all() |
There was a problem hiding this comment.
Whats happend when an exception is happing on the disconnect function?
| if hasattr(e, "add_note"): | ||
| e.add_note( | ||
| f"While connecting to Agent OS target computer " | ||
| f"{target.description!r} " | ||
| f"(computer_id={target.computer_id!r}, " | ||
| f"session_guid={target.session_guid}, " | ||
| f"display={target.display}, " | ||
| f"address={target.address})" | ||
| ) | ||
| raise |
There was a problem hiding this comment.
Why aer you not creating a new Exception? You should not manipulate a old one.
| target.start() | ||
| started_process = True | ||
| channel = grpc.insecure_channel( | ||
| target.address, | ||
| options=[ | ||
| ("grpc.max_send_message_length", 2**30), | ||
| ("grpc.max_receive_message_length", 2**30), | ||
| ("grpc.default_deadline", 300000), | ||
| ], | ||
| ) | ||
| stub = controller_v1.ControllerAPIStub(channel) |
There was a problem hiding this comment.
Why do we have inside the Manager GRPC Code? This should be moved to the Client. We need to get rid of this.
| started_process: bool | ||
|
|
||
|
|
||
| class AskUiControllerClient(AgentOs): |
There was a problem hiding this comment.
Why does a Client has a Manager with multiple Targets? This should be another way around: Manager has Multiple Target and the Target should have the Client.
There was a problem hiding this comment.
Rename to: MultiComputerTargetAgentOS
programminx-askui
left a comment
There was a problem hiding this comment.
Hello @mlikasam-askui ,
I added some changes. But we have to do after this more work.
| started_process: bool | ||
|
|
||
|
|
||
| class AskUiControllerClient(AgentOs): |
There was a problem hiding this comment.
Rename to: MultiComputerTargetAgentOS
| started_process: bool | ||
|
|
||
|
|
||
| class AgentOsTargetComputerManager: |
There was a problem hiding this comment.
Rename to `ComputerTargetPool``
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class AgentOsTargetComputer: |
There was a problem hiding this comment.
| class AgentOsTargetComputer: | |
| class ComputerTarget |
| ) | ||
|
|
||
|
|
||
| class LocalAgentOsTargetComputer(AgentOsTargetComputer): |
There was a problem hiding this comment.
| class LocalAgentOsTargetComputer(AgentOsTargetComputer): | |
| class LocalComputerTarget(ComputerTarget): |
| self._process = None | ||
|
|
||
|
|
||
| class RemoteAgentOsTargetComputer(AgentOsTargetComputer): |
There was a problem hiding this comment.
| class RemoteAgentOsTargetComputer(AgentOsTargetComputer): | |
| class RemoteComputerTarget(ComputerTarget): |
| started_process=started_process, | ||
| ) | ||
|
|
||
| def disconnect_all(self) -> None: |
|
|
||
| @telemetry.record_call() | ||
| @override | ||
| def add_remote_agent_os_target_computer( |
There was a problem hiding this comment.
Can we not have a add function with accepts
| def add_remote_agent_os_target_computer( | |
| def add(computer_target: RemoteComputerTargetOptions | LocalComputerTargetOptions |
| def reset_agent_os_target_computers( | ||
| self, | ||
| agent_os_target_computers: list[AgentOsTargetComputer] | None = None, | ||
| ) -> None: |
There was a problem hiding this comment.
This function should not accept a list. It should use the current pool and reset it
| def reset_agent_os_target_computers( | |
| self, | |
| agent_os_target_computers: list[AgentOsTargetComputer] | None = None, | |
| ) -> None: | |
| def reset_computer_targets( | |
| self, | |
| ) -> None: |
| def add_agent_os_target_computer( | ||
| self, agent_os_target_computer: "AgentOsTargetComputer" | ||
| ) -> "AgentOsTargetComputer": | ||
| """Register an additional target computer. Auto-connects if connected.""" | ||
| raise NotImplementedError | ||
|
|
||
| def add_remote_agent_os_target_computer( | ||
| self, | ||
| address: str, | ||
| description: str, | ||
| ) -> "RemoteAgentOsTargetComputer": | ||
| """Register an additional remote target computer.""" | ||
| raise NotImplementedError | ||
|
|
||
| def reset_agent_os_target_computers( | ||
| self, | ||
| agent_os_target_computers: "list[AgentOsTargetComputer] | None" = None, | ||
| ) -> None: | ||
| """Disconnect (if connected) and replace the target computer list.""" | ||
| raise NotImplementedError | ||
|
|
||
| def list_agent_os_target_computers(self) -> "list[AgentOsTargetComputer]": | ||
| """Return all registered target computers.""" | ||
| raise NotImplementedError | ||
|
|
||
| def get_current_computer_target_id(self, report: bool = True) -> str: | ||
| """Return the `computer_id` of the currently active target computer.""" | ||
| raise NotImplementedError | ||
|
|
||
| def switch_agent_os_target_computer( | ||
| self, computer_id: str | ||
| ) -> "AgentOsTargetComputer": | ||
| """Switch the active target computer by its `computer_id`.""" | ||
| raise NotImplementedError | ||
|
|
||
| def temporary_select(self, computer_id: str) -> AbstractContextManager[Self]: | ||
| """ | ||
| Temporarily switch the active target computer for the duration of a `with` | ||
| block, then restore the previously-active target on exit (even if the | ||
| block raises). | ||
|
|
||
| Args: | ||
| computer_id (str): Computer id of the target to activate inside the | ||
| block. | ||
|
|
||
| Returns: | ||
| AbstractContextManager[Self]: Context manager that yields this | ||
| `AgentOs` with the selected target active. | ||
|
|
||
| Example: | ||
| ```python | ||
| with agent_os.temporary_select('Remote-Machine') as remote_machine: | ||
| img = remote_machine.screenshot() | ||
| img.save("remote_machine.png") | ||
| # previous active target restored here | ||
| ``` | ||
| """ | ||
| raise NotImplementedError | ||
|
|
There was a problem hiding this comment.
This should not be part of the default AgentOS
-> Please Remove
There was a problem hiding this comment.
-> This should be part of an ComputerAgentOS
|
|
||
|
|
||
| class ComputerSwitchAgentOsTargetComputerTool(ComputerBaseTool): | ||
| def __init__(self, agent_os: AgentOs | None = None) -> None: |
There was a problem hiding this comment.
The init should either accept only MultiComputerTargetAgentOS or check if it's an instance of,
There was a problem hiding this comment.
So all ComputerBaseTools should only accept an ComputerAgentOS
Summary
AskUiControllerClientcan now manage multiple Agent OS target computers(local and/or remote) at once. One is active at a time and receives the agent's
actions; callers can switch at runtime or scope a switch to a
withblock.New API:
AgentOsTargetComputer(base),LocalAgentOsTargetComputer(replaces the oldAskUiControllerServer— owns the local controller subprocess; auto-detectsthe Windows
AskuiCoreServiceand switches to its port),RemoteAgentOsTargetComputer(already-running remote controller, no processmanagement).
AgentOsTargetComputerManager— enforces invariants (at most one local,unique session GUIDs /
computer_ids / remote addresses) and owns the gRPCconnection lifecycle. Targets are addressed exclusively by
computer_id(single
dictlookup; no secondary indices); the connection dict is keyed bycomputer_idtoo.AskUiControllerClientand exposed throughAgentOs/ComputerAgentOsFacade:add_agent_os_target_computer,add_remote_agent_os_target_computer,list_agent_os_target_computers,get_current_computer_target_id(returns thecomputer_idstring of theactive target),
switch_agent_os_target_computer,reset_agent_os_target_computers,temporary_select.AndroidAgentOs/PpadbAgentOsget a siblingtemporary_select(device_sn).ComputerAgentauto-registers three newact()tools so the LLM can drivemulti-target flows:
ComputerListAgentOsTargetComputersTool,ComputerSwitchAgentOsTargetComputerTool,ComputerGetCurrentComputerTargetIdTool.LocalAgentOsTargetComputerandRemoteAgentOsTargetComputerare re-exportedfrom the top-level
askuipackage.New unit tests cover the target-computer classes, the manager, the multi-target
client, and the new computer tools. The e2e test was updated to the new
constructor.