Proposed skill name
serving-llms-on-instinct
Does something like this already exist?
Yes — as documentation, a runbook, or internal guide
Where should this skill live?
Path B: authored in a product repo (HIP, ROCm, Ryzen AI, Lemonade, ...) and registered here
Catalog focus area
Cross-stack porting
Skill description
Description: Deploy and optimize LLM inference on AMD Instinct GPUs. Covers the full path from "I want to serve a model" to a running, benchmarked endpoint, including a DevCloud on-ramp for developers who don't have AMD hardware yet.
Flow: Trigger run -> Detect GPU ( if not found, trigger AMD Developer cloud setup) -> Decide VLLM vs SGLang Engine selection, and its Attention backends ( AITER, FA etc) ->Quark -> Env Vars -> Runtime

Proposed skill name
serving-llms-on-instinct
Does something like this already exist?
Yes — as documentation, a runbook, or internal guide
Where should this skill live?
Path B: authored in a product repo (HIP, ROCm, Ryzen AI, Lemonade, ...) and registered here
Catalog focus area
Cross-stack porting
Skill description
Description: Deploy and optimize LLM inference on AMD Instinct GPUs. Covers the full path from "I want to serve a model" to a running, benchmarked endpoint, including a DevCloud on-ramp for developers who don't have AMD hardware yet.
Flow: Trigger run -> Detect GPU ( if not found, trigger AMD Developer cloud setup) -> Decide VLLM vs SGLang Engine selection, and its Attention backends ( AITER, FA etc) ->Quark -> Env Vars -> Runtime