Skip to content

feat: Voice input/output via STT and TTS sidecar services #102

@rdwj

Description

@rdwj

Summary

Add speech-to-text and text-to-speech support through OpenAI-compatible sidecar services. The agent server routes audio requests to sidecars, keeping the core text pipeline unchanged. This enables voice-driven agent interactions without modifying BaseAgent.

Requirements

  • Audio content block support in ChatCompletionRequest: {"type": "input_audio", "input_audio": {"data": "base64...", "format": "wav"}}
  • STT sidecar configuration in agent.yaml (endpoint URL for /v1/audio/transcriptions)
  • TTS sidecar configuration in agent.yaml (endpoint URL for /v1/audio/speech)
  • Server-layer MediaPreprocessorMiddleware that converts audio content blocks to text (via STT) before reaching BaseAgent
  • Optional TTS post-processing that converts agent text response to audio
  • Streaming support for real-time voice via WebSocket (future phase)

FIPS Considerations

No blockers. AI inference is unaffected by FIPS. Audio codecs (WAV, MP3, OPUS) are not cryptographic. TLS for sidecar communication uses system OpenSSL. Recommended STT: Granite Speech 3.3-8B on vLLM (Apache 2.0, Red Hat ecosystem) or Faster-Whisper. Recommended TTS: Kokoro-FastAPI (OpenAI-compatible, Apache 2.0).

Implementation Notes

MediaPreprocessorMiddleware is server-layer only — BaseAgent works exclusively with text. Sidecar deployment is infrastructure (Helm subchart or separate Deployment), not framework code. The middleware pattern keeps the audio concern cleanly separated from agent logic. Part of the multimodal initiative.

Companion Issues

Companion issues will be filed on fips-agents/gateway-template, fips-agents/ui-template, fips-agents/fips-agents-cli, and fips-agents/examples.

Size

M

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions