LLM-powered agent for building and validating multimodal infrastructure using Ray.
RayAgent is an intelligent orchestrator that captures user intent, analyzes sample data, and automatically scaffolds infrastructure for multimodal AI applications. It builds buckets, feature extractors, taxonomies, retrievers, and test queries—entirely agentically, using Ray for distributed execution and job orchestration.
┌───────────────────────────┐
│ User Input (CLI/UI) │
│ - Sample files │
│ - Example queries │
│ - Success criteria │
└────────────┬──────────────┘
│
▼
┌───────────────────────────┐
│ LLM Planning Agent │
│ - Chooses next action │
│ - Builds tool call plans │
└────────────┬──────────────┘
│
┌─────────────────▼─────────────────┐
│ Ray Job Submission (runner) │
│ - Submits entire agent loop │
│ - Executes tool actions via Ray │
└─────────────────┬─────────────────┘
│
┌─────────────────────────┴──────────────────────────┐
│ Mixpeek Infrastructure API │
│ - /buckets - /collections │
│ - /extractors - /retrievers │
│ - /taxonomies - /validators │
└─────────────────────────┬──────────────────────────┘
│
▼
┌──────────────────────────────┐
│ Streaming + Logging Module │
│ - Redis pubsub / stdout │
│ - Optional WebSocket UI │
└────────────┬──────────────────┘
│
▼
┌───────────────────────────┐
│ Real-Time Feedback │
│ - Logs of agent steps │
│ - Success/failure markers │
│ - Suggestions │
└───────────────────────────┘
- 🔁 Agentic Planning Loop – Powered by an LLM that recursively calls your infrastructure APIs.
- 🧠 Intent-to-Infrastructure – Converts user goals and sample queries into fully configured Mixpeek (or custom) resources.
- 🧰 Supports Multimodal Inputs – Works with video, image, audio, PDFs, and structured metadata.
- 🚀 Ray-Native Execution – Each step (create, test, enrich, evaluate) runs as a task inside a Ray Job.
- 📡 Streaming Logs – All API calls, results, and failures are streamed back to the client in real time.
- ✅ Validation Suite – Asserts search quality, enrichment coverage, latency, and taxonomy performance.
- 🔌 Pluggable Backends – Built for Mixpeek, but easily extendable to other infra APIs.
- Auto-generate infrastructure for RAG or video understanding pipelines
- Test retriever quality on real user queries
- Build taxonomies from scratch based on uploaded data
- Evaluate enrichment coverage and clustering utility
- Serve as an agent sandbox template for infrastructure code execution
rayagent/
├── agent/ # LLM planner + tool definitions
│ ├── planner.py
│ └── tools.py
├── executor/ # Ray job logic + task execution
│ ├── runner.py
│ └── logger.py
├── validators/ # Infra success criteria tests
│ ├── recall.py
│ └── enrichment.py
├── ui/ # Optional WebSocket or Streamlit interface
├── main.py # CLI entrypoint
└── config.yaml # Infra API config + prompt templates
git clone https://github.com/your-org/RayAgent.git
cd RayAgent
pip install -r requirements.txtEdit config.yaml to point to your infrastructure APIs (e.g. Mixpeek SDK or local dev server).
mixpeek:
api_base: "http://localhost:8000"
api_key: "YOUR_KEY"
llm:
provider: "openai"
model: "gpt-4o"python main.py init --files path/to/data --queries path/to/intents.jsonOr submit as a Ray Job:
ray job submit --working-dir . -- python main.py init --files ./sample/ --queries ./intents.jsonThe agent uses an LLM to:
- Inspect file types and metadata
- Parse example queries
- Plan infrastructure actions (e.g., create bucket, add extractor, validate retriever)
- Recursively execute actions via Ray tasks
- Stream results to the frontend or CLI
All actions are tool calls backed by your API (e.g. POST /buckets, POST /retrievers, etc.)
After setup, the agent runs assertions on:
- Query match rate – % of sample queries returning relevant docs
- Taxonomy coverage – % of content tagged by labels
- Clustering coherence – Number and entropy of clusters
- Latency – Cold/warm search timing
Results are output as a markdown or JSON report:
{
"retriever_success_rate": 0.88,
"taxonomy_coverage": 92.3,
"average_latency_ms": 712,
"suggestions": [
"Add a fallback text retriever for PDF-only docs",
"Cluster entropy > 1.5 – consider pruning noisy labels"
]
}RayAgent streams execution logs using:
stdout(CLI)- Redis PubSub or WebSocket (UI integration)
- Optional: Ray Dashboard Task Logs
You’ll see output like:
[Agent] ✅ Created bucket: video-inputs
[Agent] ✅ Added CLIP extractor
[Agent] ❌ Retriever returned 0 results for: 'Find people walking in snow'
[Agent] ✅ Added reranker
[Validator] ✅ 91.5% of queries returned relevant documents
python main.py init \
--files ./sample-data \
--queries ./example-queries.json \
--goal "Build a retriever that supports reverse video search and tags all PDFs with relevant clusters"You can add your own tools:
@tool(name="CreateCustomIndex")
def create_index(name: str, config: dict) -> dict:
return requests.post(f"{api_base}/indexes", json={...}).json()Then register it with the planner.
PRs welcome! Please open issues or feature requests before submitting large changes.
MIT
- Add Hugging Face agent mode
- Streamlit-based demo frontend
- Auto-doc feedback loop
- Plugin system for other vector DBs (Qdrant, Pinecone, etc.)
