Skip to content

microsoft/foundry-agent-voice-mode-sample

foundry-agent-voice-mode-sample

A runnable sample that wires the browser microphone to Agent voice mode in Microsoft Foundry. A small FastAPI broker holds the credentials, opens a WebSocket to the Voice Live realtime endpoint, and binds the session to a hosted Foundry agent. The agent answers travel questions using tool calls and replies in natural speech.

A three-part workshop in labs/ walks you from a basic voice loop to a fully bound hosted agent.

What this repository is

  • A sample. Clone it, fill in .env, run scripts/start-local.ps1, and talk to the agent in your browser.
  • A workshop. Three progressive labs under labs/ teach the pattern step by step.
  • A reference. The exact Voice Live URL contract that the Foundry portal uses is encoded in voicelive/server/voicelive_session.py and probed by scripts/test-session.ps1. Use it as a regression test when the API changes.

It is not a production library. The broker is intentionally small so the pattern is easy to fork.

What is in this repository

  • voicelive/server is a FastAPI broker that holds credentials, builds the upstream WebSocket URL, and relays audio frames in both directions.
  • voicelive/client is a small static page that captures microphone audio, ships PCM16 frames over WebSocket, and renders transcripts with markdown.
  • voicelive/config/session.json is the first frame the browser sends after the socket opens. It pins the voice, the noise reduction, the echo cancellation, and the semantic VAD.
  • agent/ contains the Foundry agent definition, the system prompt, and three sample tools (weather, flight status, hotel details).
  • infra/ contains a Bicep template that provisions the Foundry resource, the project, and the model deployment.
  • labs/ is the three-part workshop.
  • docs/blog/ contains a stand-alone HTML blog post that summarises the architecture for an external audience.

Quick start (sample path)

  1. Copy .env.sample to .env and fill in the values from your Foundry resource.
  2. Create a Python virtual environment and install requirements.txt.
  3. Run scripts/start-local.ps1 to launch the broker on http://127.0.0.1:8000.
  4. Open the URL in a browser, allow microphone access, and start talking.
  5. Run scripts/test-session.ps1 for a non-interactive smoke test.

Workshop path

If you would rather build up to the full sample one step at a time, work through the labs in order.

Each lab is self-contained and ends with a working checkpoint, so you can stop after any lab and still have something that runs.

Architecture

The browser never sees an Azure key or token. The broker performs the upstream handshake with either a bearer token from DefaultAzureCredential or an API key, then pipes frames in both directions.

The Voice Live WebSocket URL is built as follows.

wss://<resource>.cognitiveservices.azure.com/voice-live/realtime
  ?api-version=2025-10-01
  &model=<agent display name>
  &agent-project-name=<project name>
  &agent-id=<agent id>
  &agent-access-token=<aad token for cognitive services>
  &authorization=Bearer%20<same aad token>

The model query parameter must match the agent display name. The authorization value must be URL-encoded.

Documentation

Deploying to Azure

The end-to-end deployment is documented in docs/deployment.md. The summary is as follows.

  1. Provision the Foundry resource, project, and model with the Bicep template in infra/.
  2. Create the agent in the Foundry portal. The SDK path in scripts/deploy-agent.ps1 produces an Assistants-style agent that lacks the microsoft.voice-live.enabled metadata required by the working URL shape, so the portal is currently the only reliable route.
  3. Assign the Cognitive Services User role to the identity that will run the broker.
  4. Set the broker environment variables and run it locally, or deploy it to Azure Container Apps with a managed identity.

Contributing

See CONTRIBUTING.md for the contributor guide and CODE_OF_CONDUCT.md for the community standards. Security issues should be reported privately as described in SECURITY.md. Questions and help requests are covered in SUPPORT.md.

License

This project is released under the MIT License. See LICENSE for the full text.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorised use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark and Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos is subject to those third parties' policies.

About

Voice-mode hosted agent on Microsoft Foundry — FastAPI broker + browser mic + Voice Live realtime + sample tools and labs.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors