feat: Voice input/output via STT and TTS sidecar services

## Summary

Add speech-to-text and text-to-speech support through OpenAI-compatible sidecar services. The agent server routes audio requests to sidecars, keeping the core text pipeline unchanged. This enables voice-driven agent interactions without modifying BaseAgent.

## Requirements

- Audio content block support in `ChatCompletionRequest`: `{"type": "input_audio", "input_audio": {"data": "base64...", "format": "wav"}}`
- STT sidecar configuration in `agent.yaml` (endpoint URL for `/v1/audio/transcriptions`)
- TTS sidecar configuration in `agent.yaml` (endpoint URL for `/v1/audio/speech`)
- Server-layer `MediaPreprocessorMiddleware` that converts audio content blocks to text (via STT) before reaching BaseAgent
- Optional TTS post-processing that converts agent text response to audio
- Streaming support for real-time voice via WebSocket (future phase)

## FIPS Considerations

No blockers. AI inference is unaffected by FIPS. Audio codecs (WAV, MP3, OPUS) are not cryptographic. TLS for sidecar communication uses system OpenSSL. Recommended STT: Granite Speech 3.3-8B on vLLM (Apache 2.0, Red Hat ecosystem) or Faster-Whisper. Recommended TTS: Kokoro-FastAPI (OpenAI-compatible, Apache 2.0).

## Implementation Notes

`MediaPreprocessorMiddleware` is server-layer only — BaseAgent works exclusively with text. Sidecar deployment is infrastructure (Helm subchart or separate Deployment), not framework code. The middleware pattern keeps the audio concern cleanly separated from agent logic. Part of the multimodal initiative.

## Companion Issues

Companion issues will be filed on fips-agents/gateway-template, fips-agents/ui-template, fips-agents/fips-agents-cli, and fips-agents/examples.

## Size

M

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Voice input/output via STT and TTS sidecar services #102

Summary

Requirements

FIPS Considerations

Implementation Notes

Companion Issues

Size

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: Voice input/output via STT and TTS sidecar services #102

Description

Summary

Requirements

FIPS Considerations

Implementation Notes

Companion Issues

Size

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions