From 910a2d4b28750ad4db55e7aa54ff12533e84b80f Mon Sep 17 00:00:00 2001 From: Rohan Kumar Date: Tue, 2 Jun 2026 15:30:41 +0530 Subject: [PATCH 1/2] docs: add platform-managed Kubernetes sandboxes RFC Signed-off-by: Rohan Kumar --- .../README.md | 627 ++++++++++++++++++ 1 file changed, 627 insertions(+) create mode 100644 rfc/0005-platform-managed-kubernetes-sandboxes/README.md diff --git a/rfc/0005-platform-managed-kubernetes-sandboxes/README.md b/rfc/0005-platform-managed-kubernetes-sandboxes/README.md new file mode 100644 index 000000000..b3fabc27c --- /dev/null +++ b/rfc/0005-platform-managed-kubernetes-sandboxes/README.md @@ -0,0 +1,627 @@ +--- +authors: + - "@rohancmr" +state: draft +links: + - https://github.com/NVIDIA/OpenShell/issues/1678 + - https://github.com/NVIDIA/OpenShell/pull/1680 +--- + +# RFC 0005 - Platform-Managed Kubernetes Sandboxes + +## Summary + +An existing Kubernetes platform can own tenant onboarding, namespace creation, +quotas, network policy, secret synchronization, policy compilation, scheduling +and audit while delegating sandbox execution to OpenShell. A trusted platform +controller calls OpenShell Gateway, and OpenShell authorizes that controller +before provisioning a sandbox with platform-selected Kubernetes properties, +supplied sandbox policy, approved credentials, ownership metadata and lifecycle +reporting. + +The main requirements are: + +- a clear OpenShell control-plane authentication and authorization model for + trusted platform controllers; +- support for platform-selected Kubernetes sandbox namespaces, separate from the + OpenShell control-plane namespace; +- Kubernetes-specific sandbox configuration hooks aligned with the + `driver_config` proposal in #1589; +- a credential-source model that can support Kubernetes Secrets as one backend + for approved provider credential material; +- optional integration with Agent Sandbox resources such as `SandboxClaim` and + `SandboxWarmPool`; and +- lifecycle/status events that platform schedulers, audit systems and cleanup + controllers can consume. + +## Motivation + +Kubernetes platforms that host multiple customers and workloads often need to +run AI-agent tool execution, generated code, MCP clients, model clients and +workflow-specific automation in secure, isolated sandbox environments. The +platform can own the enterprise control-plane responsibilities: + +- tenant and project onboarding; +- namespace creation and lifecycle; +- namespace labels, quotas, `LimitRange` and `NetworkPolicy` objects; +- policy-pack selection and policy compilation; +- Vault/External Secrets Operator-backed Kubernetes Secret creation; +- scheduling, quota and audit; and +- cleanup of short-lived execution environments. + +OpenShell is a good fit as the sandbox execution plane behind that platform. +The platform should decide tenant authorization, target namespace, final policy, +approved secrets and runtime placement profile. OpenShell should then provision +and operate the sandbox in the requested Kubernetes namespace with the supplied +final policy and approved credential-source references. + +This is different from a direct sandbox-as-a-service model where end users call +OpenShell and do not know which control plane runs their sandboxes. In the +platform-managed model, the direct OpenShell caller is a trusted platform +controller. End-user authorization can remain mediated by the platform, while +OpenShell still needs to authenticate and authorize the platform controller and +protect OpenShell sandbox operations. + +Today, the required integration and authorization hooks are not all available: + +- The Kubernetes driver is configured with one sandbox namespace and creates + Agent Sandbox `Sandbox` resources in that configured namespace. A single + sandbox namespace is not a sufficient multi-tenant boundary because all + customer sandboxes share the same namespace-level quota, RBAC/service account + surface, NetworkPolicy scope, image pull secret surface, labels, cleanup + lifecycle and blast-radius domain. Platform-managed tenants need separate + namespaces so each customer or workload class can have its own quota, + default-deny policy, approved ingress/egress, service accounts, labels, + audit metadata and cleanup lifecycle. +- The sandbox create path does not expose a trusted platform-selected target + namespace. +- OpenShell provider credentials are stored in OpenShell provider records; + provider/provider-v2 attachment does not accept Kubernetes `Secret` + references as the credential source. +- The Kubernetes driver does not expose `SandboxClaim` and `SandboxWarmPool` as + the platform allocation path for warm pools or template-backed placement. +- OpenShell control-plane authorization does not yet describe how a platform + controller would be authenticated and scoped to specific namespaces, Secret + references, service accounts, Kubernetes driver configuration, sandbox + ownership and sandbox operations. + +Without these hooks, a Kubernetes platform would need workarounds such as one +OpenShell Gateway per tenant, storing customer credentials in OpenShell provider +records instead of Kubernetes Secrets, or using a shared sandbox namespace. +Those options are harder to operate and weaker as a multi-tenant isolation +model. + +## Non-goals + +- Making OpenShell the full multi-tenant Kubernetes platform. The platform owns + tenant onboarding, namespace creation, quota, RBAC, NetworkPolicy, external + secret synchronization, policy compilation, audit and cleanup. +- Defining the direct sandbox-as-a-service user experience. This draft focuses + on an existing Kubernetes platform delegating sandbox execution to OpenShell. +- Letting untrusted callers choose arbitrary namespaces or Kubernetes Secrets. + This draft assumes an authenticated and authorized platform-controller + identity calls OpenShell. +- Changing OpenShell policy semantics. The platform composes the final sandbox + policy and passes it to OpenShell; policy compilation is outside this RFC. +- Finalizing Kubernetes Secret credential APIs, `SandboxWarmPool` ownership, or + a higher-level OpenShell domain object above `Sandbox`. This draft captures + the requirements and open questions so they can align with the broader + OpenShell authorization, credential and driver configuration work. +- Requiring one OpenShell Gateway per tenant. Gateways may still be sharded by + cluster, capacity, compliance domain or risk profile, but that should not be + required solely to place sandboxes in different namespaces. + +## Proposal + +### Usage pattern + +This RFC targets the platform-delegated usage pattern: + +```text +tenant user or tenant service + -> Kubernetes platform control plane + -> OpenShell Gateway + -> Kubernetes driver + -> Agent Sandbox resources + -> sandbox pod with OpenShell supervisor +``` + +In this model, OpenShell is not directly exposed as the tenant-facing +sandbox-as-a-service API. The Kubernetes platform remains the tenant-facing +control plane and mediates end-user authorization. OpenShell receives requests +from one or more trusted platform-controller identities and authorizes those +identities to create and operate sandboxes within configured scopes. + +The direct sandbox-as-a-service model is related but separate. In that model, +OpenShell would need to authenticate and authorize end users directly. This RFC +does not attempt to define that full user-facing model. + +### Authorization and trust model + +The trust boundary must be explicit. A requested namespace, Secret reference, +service account or driver configuration value is not trusted by itself. It is +trusted only when requested by an authenticated and authorized OpenShell caller. + +For the platform-managed Kubernetes model, the caller is a platform controller. +That controller should authenticate to OpenShell Gateway with a control-plane +identity, such as: + +- a Kubernetes ServiceAccount/OIDC token with a gateway-specific audience; +- an mTLS client identity; or +- another configured gateway identity suitable for platform automation. + +OpenShell then maps that authenticated caller to an authorization subject and +checks the requested operation against configured scopes. At minimum, the +authorization model needs to decide whether the caller can: + +- create a sandbox; +- request a specific Kubernetes namespace or namespace pattern; +- reference a specific credential source or Kubernetes Secret namespace; +- request a specific Kubernetes service account; +- provide specific Kubernetes `driver_config` fields; +- request direct `Sandbox` creation, `SandboxClaim` allocation or warm-pool + selection; +- attach the supplied sandbox policy; +- connect to, stream logs from, execute against or delete the sandbox; and +- observe lifecycle/status events for the sandbox. + +The Kubernetes platform authorization model and the OpenShell authorization +model are different but connected: + +- Kubernetes authorization protects Kubernetes resources, namespaces, service + accounts, quotas, NetworkPolicies and Secrets. +- OpenShell authorization protects OpenShell API operations, sandbox ownership, + credential use, driver configuration, sandbox connection rights and sandbox + lifecycle operations. + +In the platform-delegated model, the platform may be the only direct OpenShell +caller. End-user access can remain mediated by the platform. OpenShell should +still preserve tenant, owner, project, request and policy metadata so that +sandbox ownership and audit remain visible to OpenShell, Kubernetes and the +platform. + +### Control-plane model + +The Kubernetes platform performs onboarding and request validation before +calling OpenShell: + +1. Create or reconcile the tenant namespace. +2. Apply namespace labels, `ResourceQuota`, `LimitRange`, RBAC, image pull + secrets, runtime constraints and default-deny `NetworkPolicy`. +3. Compile the sandbox policy from platform base policy, tenant policy pack and + approved endpoint policy. +4. Configure External Secrets Operator or an equivalent controller so Vault + material syncs into approved Kubernetes Secrets in the OpenShell gateway + namespace. +5. Select approved credential-source references for the requested access + profile. +6. Select runtime placement: `RuntimeClass`, node selector, tolerations, + resource requests/limits, device profile and optional warm pool. +7. Call OpenShell Gateway with an authenticated platform-controller identity, + namespace, final policy, credential-source refs, metadata and placement. + +OpenShell Gateway then: + +1. Authenticates the platform-controller identity. +2. Authorizes the requested namespace, credential sources, service account, + Kubernetes `driver_config`, allocation mode, sandbox policy attachment and + sandbox operations. +3. Resolves approved credential sources as provider credential material. +4. Provisions the sandbox in the requested namespace. +5. Uses `SandboxClaim` when requested, optionally targeting a `SandboxWarmPool`. +6. Propagates tenant, project, owner, policy hash, namespace, profile and + request metadata to OpenShell and Kubernetes resources. +7. Emits lifecycle events for scheduling, readiness, policy load, credential + load, failure and cleanup. + +```mermaid +flowchart TB + Workload["Tenant app or workload request"] --> Platform["Kubernetes platform control plane"] + + subgraph PlatformOwned [Platform owned decisions and reconciliation] + Platform --> Tenant["Resolve tenant, owner and project"] + Tenant --> Namespace["Create or reconcile requested namespace
labels, quota, RBAC, NetworkPolicy, image pull secrets"] + Tenant --> Policy["Compile final sandbox policy
base policy + tenant policy + approved endpoint policy"] + Tenant --> SecretSync["Reconcile approved credentials
Vault or external source to ESO to Kubernetes Secret"] + Tenant --> Placement["Select runtime placement
RuntimeClass, resources, node selector, tolerations, warm-pool profile"] + Namespace --> Request["Build trusted sandbox request
namespace + policy + credential refs + placement + metadata"] + Policy --> Request + SecretSync --> Request + Placement --> Request + end + + Request --> Gateway["Private OpenShell Gateway
called by platform controller identity"] + + subgraph OpenShellOwned [OpenShell owned sandbox execution] + Gateway --> Authn["Authenticate platform controller"] + Authn --> Authz["Authorize namespace, credentials, service account, driver_config and sandbox operations"] + Authz --> CredentialResolver["Resolve approved credential source
as provider credential material"] + Authz --> PolicyLoad["Attach supplied sandbox policy"] + Authz --> Driver["Kubernetes compute driver"] + CredentialResolver --> Driver + PolicyLoad --> Driver + Driver --> Allocation{"Allocation mode"} + Allocation -->|Direct path| Sandbox["Create agents.x-k8s.io/Sandbox
in requested namespace"] + Allocation -->|Claim path| Claim["Create agents.x-k8s.io/SandboxClaim
in requested namespace"] + Claim --> WarmPool["Optional SandboxWarmPool
and approved SandboxTemplate"] + end + + Sandbox --> AgentController["Agent Sandbox controller"] + Claim --> AgentController + WarmPool --> AgentController + AgentController --> Pod["Sandbox pod
OpenShell supervisor"] + Pod -->|Outbound supervisor connection| Gateway + Gateway --> Events["Lifecycle status and events
accepted, provisioning, ready, failed, deleted"] + Events --> Platform +``` + +### Request shape + +The exact API is intentionally left open while this RFC is in `draft`. The +review feedback points toward a split: + +- Kubernetes-specific implementation fields should align with the + `driver_config` proposal in #1589. +- Credential resolution should align with a broader credential-source or + Credential proto/plugin model, with Kubernetes Secrets as one backend. +- Authorization, sandbox ownership and lifecycle events should remain + OpenShell control-plane concerns, not Kubernetes driver-only settings. + +The example below is illustrative. It shows the shape of information the +platform needs to pass, not a final protobuf or CLI contract. + +Example shape: + +```yaml +tenant: team-a +owner: service:agent-backend +metadata: + project: code-review + request_id: req-abc123 + policy_hash: sha256:abc123 +policy: + compiledPolicyRef: platform-policy-team-a-git-read-sha256-abc123 +credentialsFrom: + - provider: git-readonly + sourceRef: + kind: KubernetesSecret + namespace: openshell-system + name: team-a-runtime-credentials + key: git_token +driver_config: + kubernetes: + namespace: openshell-rce-team-a-job-abc123 + serviceAccountName: sandbox-runner + runtimeClassName: gvisor + nodeSelector: + platform.example.com/node-pool: secure-rce + tolerations: + - key: platform.example.com/secure-rce + operator: Exists + resources: + requests: + cpu: "1" + memory: 2Gi + limits: + cpu: "2" + memory: 4Gi + agentSandbox: + allocation: SandboxClaim + template: python-rce-base + warmPool: rce-python-base +ttlSeconds: 1800 +``` + +### Target namespace + +The Kubernetes driver should support a platform-selected target namespace in +addition to its configured default namespace. This namespace should be treated +as Kubernetes driver configuration, likely through the `driver_config` mechanism +proposed in #1589. + +When the requested namespace is omitted, the driver should keep existing +behavior and provision into the configured namespace. When the requested +namespace is present, OpenShell must first authorize the caller to use that +namespace. The driver can then use the authorized namespace for the Agent +Sandbox resource it creates. + +OpenShell does not need to create the namespace in this RFC. The platform is +responsible for creating and reconciling the namespace before the sandbox +request. + +### Kubernetes Secret-backed provider credentials + +Kubernetes Secrets are the expected Kubernetes-native source for approved +credential material in platform-managed deployments, especially when populated +by Vault, External Secrets Operator or another secret controller. + +The open design question is where Secret-backed credentials attach in +OpenShell's data model. Kubernetes Secrets should be treated as one credential +source backend, not as a Kubernetes driver-only field. This should align with a +broader credential-source or Credential proto/plugin design. + +This is not intended to pass raw secret values to the child process as plain +environment variables. The desired model is: + +1. The platform creates or reconciles Kubernetes Secrets through ESO or another + approved secret controller. +2. The platform passes approved credential-source references to OpenShell. +3. The gateway authorizes and resolves those references as provider credential + material. +4. Existing OpenShell provider and supervisor credential handling injects or + rewrites credentials at the controlled boundary. + +Provider records remain useful for provider shape, endpoint metadata and policy +bundles. This RFC does not finalize whether Kubernetes Secret references attach +to provider records, provider v2 attachments, sandbox requests or a separate +credential-source abstraction. + +### Agent Sandbox allocation + +OpenShell already integrates with the Kubernetes SIG Apps Agent Sandbox project. +The current Kubernetes driver creates the same project's `Sandbox` CRD directly. +`SandboxClaim` and `SandboxWarmPool` are part of that Agent Sandbox CRD family +and are useful when the platform wants allocation semantics rather than direct +pod-style creation. The Kubernetes-specific selection of direct `Sandbox`, +`SandboxClaim`, template and warm-pool settings should align with the +Kubernetes `driver_config` surface rather than becoming generic OpenShell fields +unless later design work promotes some of these concepts into the portable +OpenShell data model. + +`SandboxClaim` is useful because the platform request is an allocation request: + +- select an approved `SandboxTemplate`; +- allocate an execution sandbox in the platform-selected namespace; +- attach metadata, TTL, runtime placement and policy hash; +- allow the Agent Sandbox controller to bind the claim to a matching sandbox. + +`SandboxWarmPool` is useful because some profiles need lower start latency or +controlled placement: + +- pre-warm images and base runtime for common profiles; +- keep capacity ready for interactive or high-throughput customers; +- scope warm capacity to a namespace, template, `RuntimeClass`, node selector + and tolerations; and +- support tainted/secure nodes or device-capable nodes without creating one + gateway per placement profile. + +This RFC does not decide whether OpenShell creates and reconciles +`SandboxWarmPool` resources itself or references warm pools that the Kubernetes +platform pre-creates. It also does not treat warm pools as the only lifecycle +optimization. Checkpoint/restore and scale-to-zero semantics are related +lifecycle capabilities that should be considered with the broader +proxy-to-sandbox and sandbox lifecycle design. + +### Metadata and lifecycle events + +OpenShell should preserve platform-provided metadata on OpenShell sandbox state +and Kubernetes resources. At minimum, the metadata should support: + +- tenant or customer identifier; +- project or application identifier; +- owner identity; +- request id; +- namespace; +- policy hash; +- access profile; +- runtime profile; and +- TTL or cleanup deadline. + +The gateway should also expose lifecycle events or status transitions that make +it possible for a platform scheduler and cleanup controller to understand: + +- request accepted; +- allocation started; +- Kubernetes resource created; +- supervisor connected; +- policy loaded; +- credentials resolved; +- sandbox ready; +- sandbox failed; and +- sandbox deleted. + +## Implementation plan + +Because this RFC is in `draft`, the implementation plan is intentionally +sequenced around design dependencies rather than final API changes. + +1. Define the OpenShell control-plane authorization model needed for this + pattern: + - platform-controller caller identity; + - allowed namespace scopes; + - allowed credential-source scopes; + - allowed service accounts; + - allowed Kubernetes `driver_config` fields; + - sandbox ownership and access rules; and + - sandbox operation permissions for connect, logs, exec and delete. +2. Align Kubernetes-specific request fields with #1589 so namespace, + `RuntimeClass`, service account, node placement, resources, direct + `Sandbox` creation, `SandboxClaim`, template and warm-pool settings use the + driver-owned configuration surface. +3. Define or reuse a credential-source / Credential proto plugin model so + Kubernetes Secrets can be one backend for provider credential material. +4. Decide whether this RFC should include lifecycle/status events or whether + event streaming should be tracked separately. +5. Decide whether OpenShell owns `SandboxWarmPool` creation/reconciliation or + only references platform-created warm pools. +6. Update the Kubernetes driver once the authorization, driver configuration + and credential-source contracts are agreed. +7. Add tests for authorization enforcement, backward-compatible default + namespace behavior, requested namespace behavior, credential-source + authorization, direct `Sandbox` creation and `SandboxClaim` creation. +8. Update documentation for platform-managed Kubernetes deployments. + +Existing users that rely on a single configured namespace should not need to +change their configuration. The configured namespace should remain the default +when no authorized requested namespace is supplied. + +## Risks + +- The request surface can become too broad if platform-only Kubernetes controls + are added before the OpenShell authorization model is ready. This can be + mitigated by keeping the RFC in draft until authz requirements are explicit + and by routing Kubernetes-specific fields through `driver_config`. +- Namespace selection creates a larger security boundary. OpenShell must not + trust namespace values only because they are present in a request. It must + authenticate the caller and authorize the requested namespace scope. +- Secret reference handling can accidentally expand credential access if it is + too broad. Secret-backed credentials need a credential-source authorization + model and should preserve the existing supervisor-controlled credential + injection model. +- Supporting both direct `Sandbox` and `SandboxClaim` creation adds driver + complexity. The default path should remain direct `Sandbox` creation unless + allocation mode requests a claim. + +## Alternatives + +### Make OpenShell the full multi-tenant platform + +Rejected for this use case. Kubernetes platforms already own tenant onboarding, +identity, quota, namespace creation, external secret synchronization, audit and +cleanup. OpenShell should provide the sandbox execution API that the platform +drives. + +### Deploy one OpenShell Gateway per tenant + +Rejected as the default model. It can work for capacity or compliance sharding, +but it is operationally heavy and makes gateway lifecycle management harder. +The preferred model is one private gateway per environment or cluster, with +optional risk/capacity shards. + +### Use one shared sandbox namespace + +Rejected for high-risk and multi-customer use. Namespace boundaries are +important for quotas, NetworkPolicies, cleanup, service accounts, labels and +blast-radius control. + +### Store tenant credentials as OpenShell provider records + +Rejected as the primary production model. Kubernetes platforms commonly use +Vault, External Secrets Operator and Kubernetes Secrets as the credential source +of truth. OpenShell provider profiles remain useful for endpoint and credential +shape metadata, but credential material should be able to come from approved +Kubernetes Secret references. + +### Continue direct `Sandbox` creation only + +Rejected for pooled and template-backed allocation. `SandboxClaim` and +`SandboxWarmPool` are a better fit when the platform wants warm capacity, node +placement and customer-approved runtime profiles. + +## Prior art + +- Kubernetes namespaces, `ResourceQuota`, `LimitRange`, RBAC and + `NetworkPolicy` are the standard primitives for tenant and workload + boundaries in Kubernetes platforms. +- External Secrets Operator is commonly used to sync Vault or external secret + material into Kubernetes Secrets that workloads and controllers can consume. +- Kubernetes SIG Apps Agent Sandbox provides the `Sandbox`, `SandboxClaim` and + `SandboxWarmPool` resource model used by the OpenShell Kubernetes driver. +- OpenShell provider v2 already separates provider profile metadata from + provider attachment. This draft explores how Kubernetes Secrets could become + one backend for provider credential material in platform-managed deployments. +- OpenShell PR #1589 proposes a `driver_config` passthrough for driver-owned + sandbox creation settings. Kubernetes-specific namespace, placement and Agent + Sandbox allocation settings in this RFC should align with that model. + +## Open questions + +### OpenShell authorization model + +What OpenShell authorization model is required before platform-managed +Kubernetes sandboxing can be safely enabled? + +This matters because namespace selection, Secret references, service account +selection, driver configuration and sandbox operations are privileged actions. +The RFC needs to define how OpenShell authenticates a platform controller and +how it authorizes that identity to create, connect to, observe, execute against +or delete specific sandboxes. + +### OpenShell domain object above `Sandbox` + +Should OpenShell introduce a higher-level domain object above `Sandbox` to +represent tenant/session/workload intent before mapping to a concrete sandbox +allocation? + +This matters because platform-managed deployments need tenant, owner, project, +policy, namespace and lifecycle metadata. Some of that metadata may belong on a +future OpenShell domain object instead of directly on `SandboxSpec`. + +### Requested namespace and allocation API surface + +Should the requested Kubernetes namespace and Agent Sandbox allocation settings +be portable `SandboxSpec` fields, Kubernetes driver-specific configuration, or +a split between the two? + +This matters because tenant and owner metadata can apply across compute +drivers, while Kubernetes namespaces, `SandboxClaim` and `SandboxWarmPool` are +specific to the Kubernetes driver and the Agent Sandbox resource model. The +current direction is to align Kubernetes-specific fields with #1589 +`driver_config`, but the exact split remains open. + +### Kubernetes Secret reference attachment point + +Where should Kubernetes Secret references be attached: provider records, +provider v2 attachments, sandbox requests, or a new credential-source +abstraction? + +This matters because each attachment point has a different lifecycle. Provider +records are stable configuration, provider v2 attachments are provider/profile +bindings, sandbox requests are per-allocation inputs, and a separate +credential-source abstraction would make secret sourcing explicit. Kubernetes +Secrets are the expected Kubernetes-native backend in this deployment model; the +open question is how Secret-backed credentials should be represented in the +OpenShell credential data model. + +### Gateway Secret namespace rules + +What rules should the gateway enforce when resolving Kubernetes Secret +references for provider credentials? + +This matters because the gateway needs a clear boundary for which Secret +namespaces, Secret names and keys can be read. The RFC needs to define whether +Secret references are limited to the gateway namespace, an allowlist of +namespaces, or another trust model. + +### `SandboxClaim` default behavior + +Should `SandboxClaim` become the default Kubernetes allocation path once +support exists, or should direct `Sandbox` creation remain the default with +`SandboxClaim` enabled only when requested? + +This matters because direct `Sandbox` creation is the current Kubernetes driver +behavior. Changing the default allocation path could affect existing users, +while keeping it opt-in may require platform callers to choose the allocation +mode explicitly. + +### `SandboxWarmPool` ownership + +Should OpenShell create and reconcile `SandboxWarmPool` resources, or should it +reference warm pools that the Kubernetes platform creates and manages? + +This matters because warm-pool ownership affects template lifecycle, capacity +management, placement policy and user experience. It also needs to be considered +alongside related lifecycle features such as checkpoint/restore and +scale-to-zero. + +### Lifecycle event API boundary + +Which lifecycle events should be part of the stable OpenShell API, and which +events should remain driver diagnostics? + +This matters because platform schedulers, audit systems and cleanup controllers +need dependable lifecycle signals. At the same time, exposing too many +low-level driver events as public API can create compatibility commitments that +are hard to change later. + +### Tenant and policy metadata representation + +How should OpenShell represent policy hash, tenant, owner, namespace, request id +and runtime profile metadata across sandbox status, logs, audit records and +Kubernetes resources? + +This matters because the platform needs to trace which tenant created a +sandbox, which policy was applied, which namespace was used and which request +created the workload. The RFC needs to decide which metadata belongs in +OpenShell state, Kubernetes labels, Kubernetes annotations, logs and audit +records. From 554dc211d55e04bbd734c2d1eab0fb86bf9faa68 Mon Sep 17 00:00:00 2001 From: Rohan Kumar Date: Wed, 3 Jun 2026 14:44:18 +0530 Subject: [PATCH 2/2] Clarify platform-managed sandbox authorization requirements Signed-off-by: Rohan Kumar --- .../README.md | 144 ++++++++++++++++-- 1 file changed, 131 insertions(+), 13 deletions(-) diff --git a/rfc/0005-platform-managed-kubernetes-sandboxes/README.md b/rfc/0005-platform-managed-kubernetes-sandboxes/README.md index b3fabc27c..f01b6fe97 100644 --- a/rfc/0005-platform-managed-kubernetes-sandboxes/README.md +++ b/rfc/0005-platform-managed-kubernetes-sandboxes/README.md @@ -11,18 +11,24 @@ links: ## Summary +This RFC describes a platform-managed Kubernetes usage pattern for OpenShell. An existing Kubernetes platform can own tenant onboarding, namespace creation, quotas, network policy, secret synchronization, policy compilation, scheduling -and audit while delegating sandbox execution to OpenShell. A trusted platform -controller calls OpenShell Gateway, and OpenShell authorizes that controller -before provisioning a sandbox with platform-selected Kubernetes properties, -supplied sandbox policy, approved credentials, ownership metadata and lifecycle -reporting. +and audit while delegating sandbox execution to OpenShell. + +The central requirement is not simply that OpenShell accepts more Kubernetes +configuration. The central requirement is that OpenShell first has a +control-plane authentication and authorization model that can safely support a +trusted platform controller. Only after that controller is authenticated and +authorized should OpenShell provision a sandbox with platform-selected +Kubernetes properties, supplied sandbox policy, approved credential sources, +ownership metadata and lifecycle reporting. The main requirements are: - a clear OpenShell control-plane authentication and authorization model for - trusted platform controllers; + trusted platform controllers, including sandbox ownership and operation + permissions; - support for platform-selected Kubernetes sandbox namespaces, separate from the OpenShell control-plane namespace; - Kubernetes-specific sandbox configuration hooks aligned with the @@ -51,9 +57,11 @@ platform can own the enterprise control-plane responsibilities: OpenShell is a good fit as the sandbox execution plane behind that platform. The platform should decide tenant authorization, target namespace, final policy, -approved secrets and runtime placement profile. OpenShell should then provision -and operate the sandbox in the requested Kubernetes namespace with the supplied -final policy and approved credential-source references. +approved secrets and runtime placement profile. OpenShell should then +authenticate and authorize the platform controller, verify that the requested +namespace, credential sources, service account, driver configuration and +sandbox operations are allowed for that controller, and provision the sandbox +with the supplied final policy and approved credential-source references. This is different from a direct sandbox-as-a-service model where end users call OpenShell and do not know which control plane runs their sandboxes. In the @@ -151,8 +159,12 @@ identity, such as: - another configured gateway identity suitable for platform automation. OpenShell then maps that authenticated caller to an authorization subject and -checks the requested operation against configured scopes. At minimum, the -authorization model needs to decide whether the caller can: +checks the requested operation against configured scopes. This authorization +step is what makes platform-selected Kubernetes fields safe. The gateway must +not trust a namespace, Secret reference, service account, warm-pool reference +or driver configuration value only because it is present in a request. + +At minimum, the authorization model needs to decide whether the caller can: - create a sandbox; - request a specific Kubernetes namespace or namespace pattern; @@ -174,12 +186,76 @@ model are different but connected: credential use, driver configuration, sandbox connection rights and sandbox lifecycle operations. +If OpenShell Gateway runs on Kubernetes, it may be possible to reuse some +Kubernetes authorization primitives, such as ServiceAccount identity, token +audiences, RBAC, SubjectAccessReview or namespace-scoped permissions. That +should be explored, but the OpenShell authorization decision still needs to be +explicit because OpenShell also runs outside Kubernetes and because OpenShell +API permissions are not the same as Kubernetes API permissions. For example, +Kubernetes RBAC may allow the gateway's own ServiceAccount to create a +`Sandbox` resource, while OpenShell still needs to decide whether the calling +platform-controller identity is allowed to request that namespace, attach that +credential source, connect to that sandbox, stream its logs or delete it. + In the platform-delegated model, the platform may be the only direct OpenShell caller. End-user access can remain mediated by the platform. OpenShell should still preserve tenant, owner, project, request and policy metadata so that sandbox ownership and audit remain visible to OpenShell, Kubernetes and the platform. +This means the identity models are mapped rather than identical: + +- the platform authenticates and authorizes tenant users or tenant services; +- the platform controller authenticates to OpenShell as a control-plane caller; +- OpenShell authorizes that control-plane caller against OpenShell resource + scopes and sandbox operations; and +- Kubernetes authorizes the gateway or driver identity to create the resulting + Kubernetes resources. + +The model should support both future usage patterns: + +- direct sandbox-as-a-service, where OpenShell authorizes end users directly; + and +- platform-delegated sandbox execution, where OpenShell authorizes trusted + platform controllers and the platform mediates tenant users. + +This RFC focuses on the second usage pattern, but it should not require a +separate or incompatible authorization model. + +### Authorization requirements + +Before platform-selected Kubernetes configuration is accepted, OpenShell needs +an authorization surface that can express the allowed scope of a +platform-controller identity. + +The required authorization checks include: + +- **create scope:** whether the caller can create a sandbox at all, and for + which tenant, project, profile or environment metadata; +- **namespace scope:** which namespaces or namespace patterns the caller can + target, and whether the OpenShell control-plane namespace is excluded from + sandbox placement; +- **credential scope:** which credential sources, Kubernetes Secret namespaces, + Secret names and Secret keys the caller can reference; +- **service account scope:** which Kubernetes service accounts can be requested + for sandbox pods; +- **driver configuration scope:** which Kubernetes `driver_config` fields can + be supplied by this caller, including `RuntimeClass`, resources, node + selectors, tolerations, direct `Sandbox` creation, `SandboxClaim` allocation, + template references and warm-pool references; +- **policy attachment scope:** whether the caller can attach a supplied final + policy and how OpenShell records the policy identity or hash; +- **sandbox ownership scope:** which tenant, owner, project and request metadata + must be attached to the sandbox; and +- **operation scope:** who can connect to the sandbox, stream logs, execute + commands, read files, observe events, release the sandbox or delete it. + +The concrete authorization implementation is left open. It may be an +OpenShell-native authorization policy, a Kubernetes-integrated authorization +adapter, or a combination of both. The important requirement is that these +checks happen in OpenShell before the Kubernetes driver acts on the requested +namespace, credential references, service account or driver configuration. + ### Control-plane model The Kubernetes platform performs onboarding and request validation before @@ -204,8 +280,8 @@ OpenShell Gateway then: 1. Authenticates the platform-controller identity. 2. Authorizes the requested namespace, credential sources, service account, - Kubernetes `driver_config`, allocation mode, sandbox policy attachment and - sandbox operations. + Kubernetes `driver_config`, allocation mode, sandbox policy attachment, + ownership metadata and sandbox operations. 3. Resolves approved credential sources as provider credential material. 4. Provisions the sandbox in the requested namespace. 5. Uses `SandboxClaim` when requested, optionally targeting a `SandboxWarmPool`. @@ -270,6 +346,14 @@ review feedback points toward a split: The example below is illustrative. It shows the shape of information the platform needs to pass, not a final protobuf or CLI contract. +The authorization decision for this request would be separate from the request +body. For example, before acting on the request, OpenShell would need to verify +that the authenticated platform-controller identity is allowed to create a +sandbox for `tenant: team-a`, use the requested namespace, reference the +credential source, request the Kubernetes service account and placement profile, +attach the supplied policy, and later connect to or delete the resulting +sandbox. + Example shape: ```yaml @@ -329,6 +413,12 @@ OpenShell does not need to create the namespace in this RFC. The platform is responsible for creating and reconciling the namespace before the sandbox request. +This namespace override is a trusted control-plane field, not an end-user +field. In the platform-delegated model, tenant users do not call OpenShell and +choose namespaces directly. The platform resolves the tenant and namespace, +then OpenShell authorizes the platform-controller identity to use that +namespace before provisioning. + ### Kubernetes Secret-backed provider credentials Kubernetes Secrets are the expected Kubernetes-native source for approved @@ -428,10 +518,15 @@ sequenced around design dependencies rather than final API changes. 1. Define the OpenShell control-plane authorization model needed for this pattern: - platform-controller caller identity; + - how Kubernetes ServiceAccount/OIDC, mTLS or other identities map to + OpenShell authorization subjects; + - whether Kubernetes RBAC or SubjectAccessReview can be reused when the + gateway runs on Kubernetes; - allowed namespace scopes; - allowed credential-source scopes; - allowed service accounts; - allowed Kubernetes `driver_config` fields; + - allowed policy attachment behavior; - sandbox ownership and access rules; and - sandbox operation permissions for connect, logs, exec and delete. 2. Align Kubernetes-specific request fields with #1589 so namespace, @@ -461,6 +556,11 @@ when no authorized requested namespace is supplied. are added before the OpenShell authorization model is ready. This can be mitigated by keeping the RFC in draft until authz requirements are explicit and by routing Kubernetes-specific fields through `driver_config`. +- Reusing Kubernetes RBAC alone may not be sufficient because Kubernetes API + authorization and OpenShell API authorization protect different operations. + This can be mitigated by treating Kubernetes RBAC as a possible input or + adapter for OpenShell authorization rather than as a complete replacement for + OpenShell sandbox ownership and operation checks. - Namespace selection creates a larger security boundary. OpenShell must not trust namespace values only because they are present in a request. It must authenticate the caller and authorize the requested namespace scope. @@ -537,6 +637,24 @@ The RFC needs to define how OpenShell authenticates a platform controller and how it authorizes that identity to create, connect to, observe, execute against or delete specific sandboxes. +The answer should cover both direct sandbox-as-a-service and platform-delegated +usage. In the platform-delegated case, the tenant user is authorized by the +platform, while OpenShell authorizes the platform-controller identity and the +operations that identity can perform. + +### Kubernetes RBAC and OpenShell authorization + +When OpenShell Gateway runs on Kubernetes, should OpenShell reuse Kubernetes +RBAC, SubjectAccessReview, ServiceAccount identities or token audiences for +some authorization decisions, or should it use a separate OpenShell-native +authorization policy? + +This matters because Kubernetes already has mature resource authorization, but +OpenShell also needs to authorize OpenShell-specific operations such as sandbox +ownership, connect, logs, exec, credential use, policy attachment and lifecycle +events. The RFC should define which decisions can be delegated to Kubernetes +and which must remain OpenShell control-plane decisions. + ### OpenShell domain object above `Sandbox` Should OpenShell introduce a higher-level domain object above `Sandbox` to