Skip to content

LLM-powered Ray agent that builds and validates multimodal infrastructure from user data and intent.

Notifications You must be signed in to change notification settings

mixpeek/RayAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

RayAgent

RayAgent header

LLM-powered agent for building and validating multimodal infrastructure using Ray.

RayAgent is an intelligent orchestrator that captures user intent, analyzes sample data, and automatically scaffolds infrastructure for multimodal AI applications. It builds buckets, feature extractors, taxonomies, retrievers, and test queries—entirely agentically, using Ray for distributed execution and job orchestration.

                 ┌───────────────────────────┐
                 │      User Input (CLI/UI)  │
                 │ - Sample files            │
                 │ - Example queries         │
                 │ - Success criteria        │
                 └────────────┬──────────────┘
                              │
                              ▼
                 ┌───────────────────────────┐
                 │      LLM Planning Agent    │
                 │  - Chooses next action     │
                 │  - Builds tool call plans  │
                 └────────────┬──────────────┘
                              │
            ┌─────────────────▼─────────────────┐
            │    Ray Job Submission (runner)    │
            │  - Submits entire agent loop      │
            │  - Executes tool actions via Ray  │
            └─────────────────┬─────────────────┘
                              │
    ┌─────────────────────────┴──────────────────────────┐
    │              Mixpeek Infrastructure API            │
    │ - /buckets           - /collections                │
    │ - /extractors        - /retrievers                 │
    │ - /taxonomies        - /validators                 │
    └─────────────────────────┬──────────────────────────┘
                              │
                              ▼
                ┌──────────────────────────────┐
                │  Streaming + Logging Module   │
                │  - Redis pubsub / stdout      │
                │  - Optional WebSocket UI      │
                └────────────┬──────────────────┘
                             │
                             ▼
                 ┌───────────────────────────┐
                 │     Real-Time Feedback     │
                 │  - Logs of agent steps     │
                 │  - Success/failure markers │
                 │  - Suggestions             │
                 └───────────────────────────┘

✨ Features

  • 🔁 Agentic Planning Loop – Powered by an LLM that recursively calls your infrastructure APIs.
  • 🧠 Intent-to-Infrastructure – Converts user goals and sample queries into fully configured Mixpeek (or custom) resources.
  • 🧰 Supports Multimodal Inputs – Works with video, image, audio, PDFs, and structured metadata.
  • 🚀 Ray-Native Execution – Each step (create, test, enrich, evaluate) runs as a task inside a Ray Job.
  • 📡 Streaming Logs – All API calls, results, and failures are streamed back to the client in real time.
  • Validation Suite – Asserts search quality, enrichment coverage, latency, and taxonomy performance.
  • 🔌 Pluggable Backends – Built for Mixpeek, but easily extendable to other infra APIs.

📦 Use Cases

  • Auto-generate infrastructure for RAG or video understanding pipelines
  • Test retriever quality on real user queries
  • Build taxonomies from scratch based on uploaded data
  • Evaluate enrichment coverage and clustering utility
  • Serve as an agent sandbox template for infrastructure code execution

📁 Project Structure


rayagent/
├── agent/                # LLM planner + tool definitions
│   ├── planner.py
│   └── tools.py
├── executor/             # Ray job logic + task execution
│   ├── runner.py
│   └── logger.py
├── validators/           # Infra success criteria tests
│   ├── recall.py
│   └── enrichment.py
├── ui/                   # Optional WebSocket or Streamlit interface
├── main.py               # CLI entrypoint
└── config.yaml           # Infra API config + prompt templates


🚀 Quickstart

1. Install dependencies

git clone https://github.com/your-org/RayAgent.git
cd RayAgent
pip install -r requirements.txt

2. Configure

Edit config.yaml to point to your infrastructure APIs (e.g. Mixpeek SDK or local dev server).

mixpeek:
  api_base: "http://localhost:8000"
  api_key: "YOUR_KEY"
llm:
  provider: "openai"
  model: "gpt-4o"

3. Run Agent

python main.py init --files path/to/data --queries path/to/intents.json

Or submit as a Ray Job:

ray job submit --working-dir . -- python main.py init --files ./sample/ --queries ./intents.json

🧠 Agent Planning

The agent uses an LLM to:

  1. Inspect file types and metadata
  2. Parse example queries
  3. Plan infrastructure actions (e.g., create bucket, add extractor, validate retriever)
  4. Recursively execute actions via Ray tasks
  5. Stream results to the frontend or CLI

All actions are tool calls backed by your API (e.g. POST /buckets, POST /retrievers, etc.)


📊 Validation Reports

After setup, the agent runs assertions on:

  • Query match rate – % of sample queries returning relevant docs
  • Taxonomy coverage – % of content tagged by labels
  • Clustering coherence – Number and entropy of clusters
  • Latency – Cold/warm search timing

Results are output as a markdown or JSON report:

{
  "retriever_success_rate": 0.88,
  "taxonomy_coverage": 92.3,
  "average_latency_ms": 712,
  "suggestions": [
    "Add a fallback text retriever for PDF-only docs",
    "Cluster entropy > 1.5 – consider pruning noisy labels"
  ]
}

📡 Streaming Output

RayAgent streams execution logs using:

  • stdout (CLI)
  • Redis PubSub or WebSocket (UI integration)
  • Optional: Ray Dashboard Task Logs

You’ll see output like:

[Agent] ✅ Created bucket: video-inputs
[Agent] ✅ Added CLIP extractor
[Agent] ❌ Retriever returned 0 results for: 'Find people walking in snow'
[Agent] ✅ Added reranker
[Validator] ✅ 91.5% of queries returned relevant documents

🧪 Example Usage

python main.py init \
  --files ./sample-data \
  --queries ./example-queries.json \
  --goal "Build a retriever that supports reverse video search and tags all PDFs with relevant clusters"

🔌 Extending

You can add your own tools:

@tool(name="CreateCustomIndex")
def create_index(name: str, config: dict) -> dict:
    return requests.post(f"{api_base}/indexes", json={...}).json()

Then register it with the planner.


🤝 Contributing

PRs welcome! Please open issues or feature requests before submitting large changes.


📄 License

MIT


🧭 Roadmap

  • Add Hugging Face agent mode
  • Streamlit-based demo frontend
  • Auto-doc feedback loop
  • Plugin system for other vector DBs (Qdrant, Pinecone, etc.)

About

LLM-powered Ray agent that builds and validates multimodal infrastructure from user data and intent.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published