WIP: replace spec.skills injection with passive skill discovery#388
WIP: replace spec.skills injection with passive skill discovery#388cooktheryan wants to merge 1 commit into
Conversation
Remove spec.skills from AgentRuntime CRD — the operator no longer mutates target Deployments to inject OCI skill ImageVolumes. This eliminates GitOps drift with ArgoCD/Flux. Skills are now declared directly in the Deployment manifest (OCI ImageVolumes or ConfigMap volumes) and discovered by the operator through the kagenti.io/skills annotation, gated behind the skillDiscovery feature flag. The agent discovers mounted skills via SKILL_FOLDERS and reports them in its A2A card. The operator reads the card and the annotation, surfacing both in status. Removed: - spec.skills, SkillImageRef, SkillPullPolicy types - reconcileSkillVolumes controller mutation code - skillImageVolumes feature gate - Webhook skill validation (name/path checks) - E2E tests, fixtures, and utils for skill injection Added: - status.linkedSkills populated from kagenti.io/skills annotation - SkillsDiscovered condition - skillDiscovery feature gate (default: false) - Skill discovery sample manifest with OCI + ConfigMap examples Assisted-By: Claude Code Signed-off-by: Ryan Cook <rcook@redhat.com>
a9d3fe7 to
177ed3b
Compare
|
@pavelanni @kevincogan @pdettori @eranra I wanted to re-roll the way skills were defined. I started the process of blogging about the skill work and I quickly realize we would make for a really painful scenario with GitOps/ArgoCD with modifying the volume mounts of the deployment. I wanted to try to pull in the skills dynamically while also referencing what was brought in with kagenti/kagenti#1440 I am open to any feedback, conversations, and etc. @kevincogan I also want to ensure this does not cause issues with any security related ideas or concepts you had. One of the big things I am thinking about with skills is the fact that a user may start with 0 skills attached. Then add a skill or two. Lifecycle it for a while then discover possibly the skills is not longer required. All of this could and should occur without the need to rewrite or rebuild the Agent unless it is necesary. |
|
Adding skills dynamically is possible (you can add a tool |
|
looking forward to what find @pavelanni I think one of the pieces I really overlooked was around how people manage their running applications. We definitely need to present and provide our solution with as much security around it as possible |
|
After reviewing several options I think the right solution for dynamic skill discovery and deployment will be a |
Summary
spec.skillsfrom the AgentRuntime CRD — the operator no longer mutates target Deployments to inject OCI skill ImageVolumes, eliminating GitOps drift with ArgoCD/Fluxkagenti.io/skillsannotationskillDiscoveryfeature gate (default: false) controls whether the operator reads the annotation and populatesstatus.linkedSkillsSKILL_FOLDERSand reports them in its A2A card — the operator reads the card and the annotation, surfacing both in statusContext
PR #332 added
spec.skillsto AgentRuntime, which instructed the operator to inject OCI ImageVolumes into the target Deployment viar.Update(). This caused GitOps drift (ArgoCD/Flux fight with the operator over the Deployment spec) and was architecturally overreaching — the operator was modifying a workload it doesn't own.This PR changes the operator's role from injector to observer:
SKILL_FOLDERSand advertises them in its A2A cardkagenti.io/skillsannotation and the card, reports both in statusThis aligns with the ConfigMap-based skill path in kagenti/kagenti#1440, where the kagenti backend sets the same annotation. Both skill delivery mechanisms (ConfigMap and OCI ImageVolume) are now visible through a single annotation and the agent's card.
Test plan
go build ./...— compiles cleanlygo vet ./...— no issuesgolangci-lint run ./...— no new issuesskillDiscovery: truepopulatesstatus.linkedSkills,skillDiscovery: falseclears itAssisted-By: Claude Code