Skip to content

feat: Microphone button, audio recording, and voice playback #18

@rdwj

Description

@rdwj

Summary

Add voice interaction controls — a microphone button for recording speech input and audio playback for TTS responses. Enables hands-free and accessibility-friendly interaction with agents.

UX Requirements

  • Microphone button in the input toolbar (toggle to start/stop recording)
  • Visual recording indicator (waveform animation or pulsing dot) while recording
  • Automatic send on recording stop, or manual confirmation (configurable via settings)
  • Audio playback controls on assistant messages when TTS is enabled (play/pause, scrubber, progress bar)
  • Voice activity detection indicator during recording
  • Browser permission prompt handling with helpful error states if mic access is denied
  • Clear visual distinction between recording state and idle state
  • Keyboard shortcut for start/stop recording

Implementation Notes

Record audio via Web Audio API / MediaRecorder. Send recorded audio to /v1/audio/transcriptions for STT, or as an audio content block in the chat request if the agent supports direct audio input. Receive audio from /v1/audio/speech for TTS playback. Audio format negotiation (opus/mp3/wav) should be handled by the gateway — the UI sends what the browser produces and plays what the gateway returns.

Companion Issues

Companion issues filed on fips-agents/agent-template, fips-agents/gateway-template, fips-agents/fips-agents-cli, and fips-agents/examples.

Size

M

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions