The #1 Kubernetes skill for Claude Code and Codex, measured by GitHub stars.
LLMs hallucinate a lot when it comes to Kubernetes. They omit security contexts, generate deprecated APIs, use wildcard RBAC, forget resource limits, and produce probes that cause cascading failures. This skill fixes it. It includes best practices for Kubernetes -- good, bad, and neutral examples so the AI avoids common mistakes. Using KubeShark, the AI keeps proven practices in mind, eliminates hallucinations, and defaults to secure, reliable, production-ready manifests.
KubeShark is built as the production-grade Kubernetes skill for Claude Code and Codex: broader than resource-template skills, safer than generic Kubernetes prompts, and tuned for hallucination prevention instead of raw tutorial volume.
Most Kubernetes skills dump huge walls of text onto the agent and burn expensive tokens -- with no upside. LLMs don't need the entire Kubernetes docs again. KubeShark was aggressively de-duplicated and optimized for maximum quality per token.
KubeShark is primarily based on the official Kubernetes documentation, the NSA/CISA Kubernetes Hardening Guide, OWASP Kubernetes Top 10, Pod Security Standards, and the CIS Kubernetes Benchmark. When guidance conflicts, it prioritizes official Kubernetes documentation.
Quick Start · Why KubeShark · Token Strategy · What's Included · How It Works · Philosophy
macOS / Linux:
git clone https://github.com/LukasNiessen/kubernetes-skill.git ~/.claude/skills/kubernetes-skillWindows (Powershell):
git clone https://github.com/LukasNiessen/kubernetes-skill.git "$env:USERPROFILE\.claude\skills\kubernetes-skill"Windows (Command Prompt):
git clone https://github.com/LukasNiessen/kubernetes-skill.git "%USERPROFILE%\.claude\skills\kubernetes-skill"That's it. Claude Code auto-discovers skills in ~/.claude/skills/ -- no restart needed.
Claude Code has a built-in plugin system with marketplace support. Instead of cloning manually, you can add KubeShark's marketplace and install directly from the CLI:
/plugin marketplace add LukasNiessen/kubernetes-skill
/plugin install kubernetes-skill
Or use the interactive plugin manager -- run /plugin, switch to the Discover tab, and install from there. The marketplace reads the .claude-plugin/marketplace.json in this repo to register KubeShark as an installable plugin.
Codex has no global skill system -- setup is per-project. Clone KubeShark into your repo and reference it from your AGENTS.md:
# Clone into your project root
git clone https://github.com/LukasNiessen/kubernetes-skill.git .kubernetes-skillThen add to your AGENTS.md (or create one in the repo root):
## Kubernetes
When working with Kubernetes manifests, Helm charts, or Kustomize overlays, follow the workflow in `.kubernetes-skill/SKILL.md`.
Load references from `.kubernetes-skill/references/` as needed.Done. Now ask Claude Code / Codex any Kubernetes question. KubeShark responses follow the 7-step failure-mode workflow and include an output contract with assumptions, tradeoffs, and rollback notes.
Invoke explicitly:
/kubernetes-skill Create a production-ready Deployment with an Ingress and autoscaling/kubernetes-skill Review my Deployment for security issues and add proper RBAC, NetworkPolicies, and resource limitsOr just ask naturally -- KubeShark activates automatically for any Kubernetes task:
Review my deployment.yaml for security issues
Create a Helm chart for a PostgreSQL StatefulSet with backup CronJobs
| Dimension | KubeShark | No Skill |
|---|---|---|
| SKILL.md activation cost | Low, procedural workflow only | 0 |
| Reference granularity | 26 focused files | -- |
| Token burn per query | Low (load only matched refs) | 0 |
| Architecture | Failure-mode workflow | -- |
| Diagnoses before generating | Yes (Step 2) | No |
| Output contract | Yes -- assumptions, tradeoffs, rollback | No |
| Conditional references | EKS, GKE, AKS, OpenShift, GitOps, observability stacks | No |
| Security-first defaults | PSS restricted profile | No |
| Good/bad examples | Yes (2 dedicated files) | No |
| Do/Don't checklist | Yes (dedicated file) | No |
| Compliance coverage | NSA/CISA, OWASP K8s Top 10, CIS, Pod Security Standards | No |
| Hallucination prevention | Core design goal | No |
| Cross-resource validation | Label/selector/port consistency checks | No |
| Helm/Kustomize guidance | Dedicated reference files | No |
| Policy engine integration | Kyverno and OPA/Gatekeeper patterns | No |
| License | MIT | -- |
The key insight is architectural. A static reference manual gives Claude information but never tells it how to think about a problem. There's no diagnosis step, no risk assessment, and no structured output -- Claude reads the reference and generates whatever it thinks fits.
KubeShark takes the opposite approach. The core SKILL.md is a compact operational workflow. It forces Claude through a diagnostic sequence: capture context -> identify failure modes -> load only the relevant references -> propose fixes with explicit risk controls -> validate -> deliver a structured output contract.
This matters for Kubernetes specifically because:
-
Silent failures are common. A Service with the wrong selector deploys successfully but routes to nothing. A NetworkPolicy with a mistyped label silently does nothing. Unlike Terraform, Kubernetes accepts most manifests without error -- failures surface at runtime.
-
Multi-dimensional risk. Kubernetes operates across security, networking, scheduling, storage, and application lifecycle simultaneously. An LLM must reason about all these dimensions for every resource.
-
Training data pollution. Kubernetes has had aggressive API deprecation. The LLM training corpus contains vast amounts of pre-1.22 YAML using removed APIs. Without diagnosis, the LLM confidently generates deprecated configurations.
Here's how KubeShark compares to other public Kubernetes-focused agent skills found during the May 2026 review. This compares Kubernetes skill behavior, not full runtime observability products or broad skill collections.
| Dimension | KubeShark | Kubernetes Resource Management | Kube Audit Kit | Kubernetes Operations |
|---|---|---|---|---|
| Primary goal | Generate, review, refactor, and harden K8s safely | Explain and template core objects | Read-only cluster security audits | Kubectl operations and debugging |
| Failure-mode workflow | Yes: six explicit Kubernetes failure modes | No | Audit workflow only | Partial operational workflow |
| Manifest generation/review | Yes | Yes | No, audit-focused | Yes |
| Helm/Kustomize guidance | Dedicated references | No dedicated coverage | No | Not primary |
| Platform-specific guidance | EKS, GKE, AKS, OpenShift via CRR | No | No | Cloud-aware, less granular |
| GitOps and observability depth | Argo CD, Flux, Prometheus, OpenTelemetry, Loki | No | No | No |
| Security posture | PSS restricted defaults, RBAC, policy engines | Basic resource guidance | Strong audit focus | General stability/security |
| Cross-resource validation | Label, selector, port, policy, and rollout checks | Basic examples | Post-export analysis | Operational troubleshooting |
| Output contract | Assumptions, tradeoffs, validation, rollback | No | Audit report | No structured contract |
| Token strategy | Lean workflow plus conditional references | Single static resource guide | Scripted audit skill | Operational guidance |
KubeShark wins on production manifest work because it diagnoses before generating, keeps platform depth behind CRR, covers Helm/Kustomize and policy engines, and forces validation plus rollback into the response contract.
- Keep
SKILL.mdprocedural and compact - Keep references focused on failure-prone decisions
- Exclude broad tutorial material with low safety impact
- Add depth only when measured quality would otherwise drop
- Use Conditional Reference Retrieval (CRR) for platform and controller specifics
Conditional Reference Retrieval is context-gated reference loading. It is a skill-specific form of progressive disclosure: the workflow first captures signals from the request or repository, then loads only the reference files whose conditions match.
This keeps common Kubernetes work lean. A plain Deployment review does not pay the token cost for EKS, GKE, AKS, OpenShift, GitOps, or observability-stack guidance. If the task mentions one of those platforms or controllers, KubeShark pulls in the matching reference and keeps the rest out of context.
CRR files live under references/conditional/. The directory name is intentionally plain-language; CRR is the mechanism, conditional is the repository convention.
| Signal detected | CRR reference | Purpose |
|---|---|---|
| EKS, AWS, IRSA, EKS Pod Identity, Karpenter | references/conditional/eks-patterns.md |
AWS identity, load balancing, storage, CNI, node provisioning |
| GKE, Autopilot, Workload Identity, Dataplane V2 | references/conditional/gke-patterns.md |
GKE identity, Autopilot constraints, networking, storage |
| AKS, Entra Workload ID, Azure CNI, AGIC | references/conditional/aks-patterns.md |
Azure identity, CNI, ingress, and CSI storage |
| OpenShift, OKD, ROSA, ARO, Routes, SCCs | references/conditional/openshift-patterns.md |
SCCs, Routes, arbitrary UID images, OpenShift validation |
| Argo CD, Flux, ApplicationSet, GitOps | references/conditional/gitops-controllers.md |
Reconciliation, pruning, ordering, drift, controller safety |
| Prometheus Operator, ServiceMonitor, OpenTelemetry, Loki, Grafana | references/conditional/observability-stacks.md |
Monitoring CRDs, telemetry pipelines, log and alert hygiene |
- A focused
SKILL.mdexecution flow - Failure-mode-first guidance to prevent common Kubernetes hallucinations
- Core failure-mode references: insecure workload defaults, resource starvation, network exposure, privilege sprawl, fragile rollouts, API drift
- Workload pattern guides for Deployments, StatefulSets, Jobs, DaemonSets
- Cross-cutting concern references: security hardening, observability, multi-tenancy, storage
- Helm and Kustomize best practices
- Conditional Reference Retrieval for EKS, GKE, AKS, OpenShift, GitOps controllers, and observability stacks
- Validation and policy engine integration (Kyverno, OPA/Gatekeeper, kubeconform)
- Good/bad example banks with LLM mistake checklists
- Do/Don't pattern checklist
Here is an overview of the repository layout.
| File | Description |
|---|---|
SKILL.md |
Operational workflow for KubeShark |
PHILOSOPHY.md |
Design strategy, architecture decisions, token experiment |
references/insecure-workload-defaults.md |
Security contexts, PSS profiles, capabilities |
references/resource-starvation.md |
Requests/limits, QoS, PDB, scheduling |
references/network-exposure.md |
NetworkPolicy, Services, Ingress, DNS |
references/privilege-sprawl.md |
RBAC, ServiceAccounts, secret management |
references/fragile-rollouts.md |
Probes, update strategies, image tags, graceful shutdown |
references/api-drift.md |
apiVersion correctness, deprecations, schema validation |
references/deployment-patterns.md |
Stateless workload patterns |
references/stateful-patterns.md |
StatefulSet and persistent workload patterns |
references/job-patterns.md |
Job and CronJob patterns |
references/daemonset-operator-patterns.md |
DaemonSet and operator patterns |
references/security-hardening.md |
NSA/CISA, OWASP, CIS controls and hardening |
references/observability.md |
Prometheus, logging, tracing, alerting patterns |
references/multi-tenancy.md |
Namespace isolation, quotas, tenant boundaries |
references/storage-and-state.md |
PV/PVC, StorageClass, CSI, backup/restore |
references/helm-patterns.md |
Helm chart structure, templates, testing |
references/kustomize-patterns.md |
Kustomize overlays, patches, generators |
references/validation-and-policy.md |
kubeconform, Kyverno, OPA/Gatekeeper, CI integration |
references/conditional/eks-patterns.md |
EKS identity, load balancing, storage, CNI, Karpenter (CRR) |
references/conditional/gke-patterns.md |
GKE identity, Autopilot, Dataplane V2, storage (CRR) |
references/conditional/aks-patterns.md |
AKS workload identity, CNI, ingress, storage (CRR) |
references/conditional/openshift-patterns.md |
OpenShift SCCs, Routes, arbitrary UID constraints (CRR) |
references/conditional/gitops-controllers.md |
Argo CD, Flux, reconciliation, pruning, sync ordering (CRR) |
references/conditional/observability-stacks.md |
Prometheus Operator, OpenTelemetry, Loki, Grafana (CRR) |
references/examples-good.md |
Production-ready annotated examples |
references/examples-bad.md |
Anti-pattern examples with explanations |
references/do-dont-patterns.md |
Do/Don't quick-reference checklist |
.github/workflows/validate.yml |
CI validation for skill structure and links |
.github/PULL_REQUEST_TEMPLATE.md |
PR quality and failure-mode checklist |
.claude-plugin/marketplace.json |
Plugin metadata |
The skill runs as a failure-mode workflow whenever Claude Code handles Kubernetes tasks:
- Capture execution context - Cluster version, distribution, namespace, environment, workload type, deployment method, policies
- Diagnose likely failure mode(s) - Insecure workload defaults, resource starvation, network exposure, privilege sprawl, fragile rollouts, API drift
- Load only relevant references - Pull targeted guidance for the failure mode(s) in scope, then apply CRR for detected platforms/controllers
- Propose fix path with controls - Include risk notes, runtime behavior, tests, and rollback expectations
- Generate implementation artifacts - YAML manifests, Helm charts, Kustomize overlays, RBAC, policies
- Validate before finalize - Dry-run, schema validation, cross-resource consistency, policy scan
- Deliver complete output - Assumptions, remediation choices, tradeoffs, validation plan, recovery notes
- Kubernetes manifest design, review, and refactoring
- Helm chart and Kustomize overlay creation
- RBAC, NetworkPolicy, and security hardening
- Workload reliability: probes, resources, rollout strategies
- Policy engine integration (Kyverno, OPA/Gatekeeper)
- Platform-specific guidance for EKS, GKE, AKS, and OpenShift when detected
- GitOps and observability-stack guidance when those controllers are in scope
- CI validation pipelines for Kubernetes
Q: Does this work with all Kubernetes distributions?
Yes. KubeShark supports vanilla Kubernetes, EKS, GKE, AKS, OpenShift, k3s, and other distributions. The workflow captures cluster version and distribution first and adapts guidance accordingly. CRR loads deeper platform references only when those distributions are detected.
Q: Will this slow down my AI interactions?
No. The skill is designed for low token overhead. Only relevant references are loaded for a given failure mode.
Q: Can I use this outside Claude Code?
Yes. The references are plain Markdown and can be used from any workflow or AI assistant, including Codex. The trigger behavior in SKILL.md is optimized for skill-enabled environments.
Q: What Kubernetes versions are supported?
KubeShark targets Kubernetes 1.25+ as the baseline (all pre-1.25 deprecated APIs are flagged). Guidance is version-aware and the workflow captures cluster version in Step 1.
Q: Does this cover Helm and Kustomize?
Yes. Dedicated reference files cover Helm chart patterns and Kustomize overlay patterns, including common LLM mistakes in both tools.
We highly appreciate contributions. See CONTRIBUTING.md and use the PR template for failure-mode and validation details.
-
LukasNiessen - Creator and main maintainer
-
janMagnusHeimann - Main maintainer
-
TristanKruse - Main maintainer
Found a bug? Want to discuss features?
- Submit an issue on GitHub
- Join GitHub Discussions
If KubeShark helps your project, please consider:
- Starring the repository
- Suggesting new features
- Contributing code or documentation
This project is under the MIT license.
Version: v1.0.0
GitHub Pages: https://lukasniessen.github.io/kubernetes-skill/
