Skip to content

Support platform-managed Kubernetes sandboxes with namespace, Secret, and SandboxClaim controls #1678

@rohancmr

Description

@rohancmr

Problem Statement

A Kubernetes platform that hosts multiple customers and workloads may need to run AI-agent tool execution, generated code, MCP clients, model clients, and workflow-specific automation in secure, isolated sandbox environments.

The platform can own the enterprise control-plane responsibilities for those customers:

  • tenant and project onboarding
  • namespace creation and lifecycle
  • namespace labels, quotas, LimitRanges and NetworkPolicies
  • policy-pack selection and policy compilation
  • Vault/ESO-backed Kubernetes Secret creation
  • scheduling, quota and audit
  • cleanup of ephemeral execution environments

In this model, OpenShell acts as the sandbox execution plane behind the Kubernetes platform. The platform should decide tenant authorization, target namespace, final policy, approved secrets, and runtime placement profile. OpenShell should then provision and operate the sandbox in the requested Kubernetes namespace with the supplied final policy and approved Kubernetes Secret references.

Today, the required integration hooks are not all available:

  • The Kubernetes driver is configured with a sandbox namespace and currently creates Agent Sandbox Sandbox resources only in that configured namespace.
  • The sandbox create path does not provide a trusted platform override for the target namespace. A platform-managed deployment needs to provision each sandbox in a platform-selected namespace with the correct tenant labels, quota, NetworkPolicy, RBAC and cleanup controls.
  • OpenShell provider credentials are stored in OpenShell provider records; provider/provider-v2 attachment does not accept Kubernetes Secret references as the credential source.
  • The sandbox environment maps render literal environment values, not valueFrom.secretKeyRef.
  • The Kubernetes driver does not expose SandboxClaim and SandboxWarmPool as the platform allocation path for warm pools or template-backed placement.

Without these hooks, a Kubernetes platform would need workarounds such as one OpenShell Gateway per tenant, storing customer credentials in OpenShell provider records instead of ESO-created Kubernetes Secrets, or using a shared sandbox namespace. Those options are harder to operate and weaker as a multi-tenant isolation model.

Proposed Design

OpenShell Gateway should support a trusted control-plane sandbox create path for a Kubernetes platform.

The platform would perform customer onboarding and request validation before calling OpenShell:

  1. Create or reconcile the tenant namespace.
  2. Apply namespace labels, ResourceQuota, LimitRange, RBAC, image pull secrets, runtime constraints and default-deny NetworkPolicy.
  3. Compile the sandbox policy from platform base policy, tenant policy pack and approved custom endpoint policy.
  4. Configure ESO so Vault material syncs into approved Kubernetes Secrets in the OpenShell gateway namespace.
  5. Select approved Secret references for the requested access profile.
  6. Select runtime placement: RuntimeClass, node selector, tolerations, resource requests/limits, GPU profile and optional warm pool.
  7. Call OpenShell Gateway with namespace, final policy, Secret refs, metadata and placement.

OpenShell Gateway would then:

  1. Resolve approved Kubernetes Secret references from the gateway namespace as a credential source.
  2. Provision the sandbox in the requested namespace.
  3. Use SandboxClaim when requested, and optionally target a SandboxWarmPool.
  4. Propagate tenant, project, owner, policy hash, namespace, profile and request metadata to OpenShell and Kubernetes resources.
  5. Emit lifecycle events for scheduling, readiness, policy load, credential load, failure and cleanup.

Desired request shape:

tenant: nvbugs
owner: service:spec-critic-backend
namespace: openshell-rce-nvbugs-job-abc123
metadata:
  project: spec-critic
  request_id: req-abc123
  policy_hash: sha256:abc123
policy:
  compiledPolicyRef: platform-policy-nvbugs-gitlab-read-sha256-abc123
credentialsFrom:
  - provider: gitlab-readonly
    env:
      GITLAB_TOKEN:
        secretKeyRef:
          namespace: openshell-system
          name: gitlab-runtime-credentials
          key: token
runtime:
  class: gvisor
  nodeSelector:
    platform.example.com/node-pool: secure-rce
  tolerations:
    - key: platform.example.com/secure-rce
      operator: Exists
resources:
  requests:
    cpu: "1"
    memory: 2Gi
  limits:
    cpu: "2"
    memory: 4Gi
agentSandbox:
  allocation: SandboxClaim
  warmPool: rce-python-base
ttlSeconds: 1800

The main feature areas are:

  • Trusted namespace override for sandbox creation.
  • SandboxClaim support in the Kubernetes driver.
  • SandboxWarmPool selection.
  • Kubernetes Secret-backed credential injection.
  • Runtime placement controls.
  • Metadata propagation.
  • Lifecycle events.

Why SandboxClaim and SandboxWarmPool

OpenShell already integrates with the Kubernetes SIG Apps Agent Sandbox project. The current Kubernetes driver creates the same project's Sandbox CRD directly. SandboxClaim and SandboxWarmPool are part of that Agent Sandbox CRD family and are useful when the platform wants allocation semantics rather than direct pod-style creation.

SandboxClaim is useful because the platform request is an allocation request:

  • select an approved SandboxTemplate
  • allocate an execution sandbox in the platform-selected namespace
  • attach metadata, TTL, runtime placement and policy hash
  • allow the Agent Sandbox controller to bind the claim to a matching sandbox
  • keep the OpenShell API as the user-facing API while using the Agent Sandbox controller for allocation

SandboxWarmPool is useful because some customer profiles need lower start latency or controlled placement:

  • pre-warm images and base runtime for common profiles
  • keep capacity ready for interactive or high-throughput customers
  • scope warm capacity to a namespace, template, RuntimeClass, node selector and tolerations
  • support tainted/secure nodes or GPU nodes without creating one gateway per placement profile

Warm pools should not reuse tenant secrets, previous writable workspaces, or live supervisor sessions. The safe use is warm infrastructure and base runtime capacity, not warm customer state.

Alternatives Considered

Make OpenShell the full multi-tenant platform

Rejected for this use case. The Kubernetes platform already owns tenant onboarding, identity, quotas, namespaces, policy creation, ESO secret creation, audit and cleanup. OpenShell should provide the sandbox execution API that the platform drives.

Deploy one OpenShell Gateway per tenant

Rejected as the default model. It can work for capacity or compliance sharding, but it is operationally heavy and makes gateway lifecycle management harder. The preferred model is one private gateway per environment or cluster, with optional risk/capacity shards.

Use one shared sandbox namespace

Rejected for high-risk and multi-customer use. Namespace boundaries are important for quotas, NetworkPolicies, cleanup, service accounts, labels and blast-radius control.

Store tenant credentials as OpenShell provider records

Rejected as the primary production model. Kubernetes platforms commonly use Vault + ESO + Kubernetes Secrets as the credential source of truth. OpenShell provider profiles remain useful for endpoint and credential shape metadata, but the credential material should be able to come from approved Kubernetes Secret references.

Continue direct Sandbox creation only

Rejected for pooled and template-backed allocation. SandboxClaim and SandboxWarmPool are a better fit when the platform wants warm capacity, node placement and customer-approved runtime profiles.

Agent Investigation

An agent reviewed the local OpenShell repository and platform/OpenShell design artifacts.

Findings:

  • The current Kubernetes driver creates namespaced agents.x-k8s.io/v1alpha1 Sandbox resources directly.
  • The current public sandbox spec supports environment maps, templates, policies and providers, but does not expose Kubernetes Secret refs as credential sources.
  • Provider v2 is useful for profile-backed provider policy and refresh metadata, but runtime credential injection still uses provider credential records and placeholder/proxy resolution.
  • The current Kubernetes driver environment mapping renders literal value entries, not valueFrom.secretKeyRef.
  • The public sandbox spec already includes a sandbox policy. This proposal assumes the platform composes that policy and passes it to OpenShell; policy composition itself is not part of this feature request.

Specific code paths checked:

Area File / function Current behavior Possible extension
Kubernetes CRD kind crates/openshell-driver-kubernetes/src/driver.rs, SANDBOX_KIND Constant is Sandbox. Add SandboxClaim support, either as a new allocation mode or a Kubernetes driver option.
Sandbox creation crates/openshell-driver-kubernetes/src/driver.rs, KubernetesComputeDriver::create_sandbox Builds a dynamic Sandbox object and sets metadata.namespace from self.config.namespace. Use a trusted request namespace when supplied; switch between direct Sandbox and SandboxClaim creation based on allocation mode.
Agent Sandbox docs docs/reference/sandbox-compute-drivers.mdx Documents that the Kubernetes driver creates namespaced agents.x-k8s.io/v1alpha1 Sandbox resources from the Kubernetes SIG Apps Agent Sandbox project. Extend docs and driver to support the same project's SandboxClaim and SandboxWarmPool resources.
K8s spec conversion crates/openshell-driver-kubernetes/src/driver.rs, sandbox_to_k8s_spec and sandbox_template_to_k8s Converts OpenShell SandboxSpec/SandboxTemplate into Agent Sandbox podTemplate, volume templates and placement fields. Add claim/template/warm-pool fields and propagate runtime placement and metadata into the claim/template path.
Provider credential resolution crates/openshell-server/src/grpc/provider.rs, resolve_provider_environment Reads credentials from OpenShell Provider records in the gateway store. Add a Kubernetes Secret-backed provider/credential source for approved Secret refs.
Public sandbox API proto/openshell.proto, SandboxSpec / SandboxTemplate Supports literal environment, template, policy, attached providers, GPU flag and template metadata. Add trusted fields such as requested namespace, credentialsFrom, allocation mode, warm-pool reference and richer runtime placement metadata.

Checklist

  • I've reviewed existing contribution guidance and the RFC process.
  • This is a design proposal, not a "please build this" request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    state:triage-neededOpened without agent diagnostics and needs triage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions