Skip to content

[Skill proposal] serving-llms-on-instinct #30

Description

@iswaryaalex

Proposed skill name

serving-llms-on-instinct

Does something like this already exist?

Yes — as documentation, a runbook, or internal guide

Where should this skill live?

Path B: authored in a product repo (HIP, ROCm, Ryzen AI, Lemonade, ...) and registered here

Catalog focus area

Cross-stack porting

Skill description

Description: Deploy and optimize LLM inference on AMD Instinct GPUs. Covers the full path from "I want to serve a model" to a running, benchmarked endpoint, including a DevCloud on-ramp for developers who don't have AMD hardware yet.

Flow: Trigger run -> Detect GPU ( if not found, trigger AMD Developer cloud setup) -> Decide VLLM vs SGLang Engine selection, and its Attention backends ( AITER, FA etc) ->Quark -> Env Vars -> Runtime

Image

Metadata

Metadata

Assignees

Labels

importedThis skill is imported from a 3rd party reposkill_proposalPropose a new skill

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions