Skip to content

feat(examples): add Ray on AKS example#5768

Draft
alyssa1303 wants to merge 4 commits into
Azure:masterfrom
alyssa1303:ray-example
Draft

feat(examples): add Ray on AKS example#5768
alyssa1303 wants to merge 4 commits into
Azure:masterfrom
alyssa1303:ray-example

Conversation

@alyssa1303
Copy link
Copy Markdown
Contributor

Summary

Add an example for running Ray applications on AKS using the KubeRay operator and Kueue for job queueing.

What's included

  • setup.sh – CLI script with commands for infrastructure provisioning, Kueue and KubeRay operator installation from MCR
  • aks-classic/ – Terraform-based AKS deployment alternative
  • inference-cpu/ – CPU inference example with Kueue integration
  • README.md – Prerequisites, deployment steps, and cleanup instructions

Helm charts

Uses internal AKS AI Runtime charts from oci://mcr.microsoft.com/aks/ai-runtime/helm:

  • Kueue v0.17.1
  • KubeRay Operator v1.6.1

alyssa1303 and others added 4 commits May 13, 2026 00:03
Add example for running Ray applications on AKS, including:
- Setup script with commands for infra, Kueue, and KubeRay operator
- RayCluster and RayJob manifests
- Terraform-based AKS deployment (aks-classic)
- CPU inference example with Kueue integration

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add run_inference_cpu command to setup.sh
- Use MCR Ray image (ray:py3.12-ray2.54.0)
- Rename namespace from e2e-stack to ray
- Rename job from e2e-inference to cpu-inference
- Consolidate install_kueue/install_kuberay into install_operators

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Head node at 4Gi was hitting 96% memory usage from Ray system
processes alone. Increased to 8Gi and updated Kueue memory quota
from 20Gi to 24Gi to accommodate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add Helm provider to deploy Kueue and KubeRay via Terraform
- Align Terraform defaults with setup.sh (VM sizes, node counts)
- Update region to centralus in both setup.sh and Terraform
- Simplify deploy.sh now that Terraform handles Helm releases

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant