Start here. This is the complete operating guide and agent contract for OpenMontage.
For architecture, key files, and conventions see PROJECT_CONTEXT.md.
When the user's first message is vague, exploratory, or asks what you can do ("make me a video", "what can you do?", "help me create something", "I want to make content"), read the onboarding skill before doing anything else:
Read: skills/meta/onboarding.md
This skill teaches you to run discovery, classify the user's setup, present capabilities in plain language, and offer starter prompts tailored to their available tools. The goal: get the user from "curious" to "making a video" in under 60 seconds.
Skip onboarding when the user arrives with a specific, actionable request (e.g., "Make a 60-second explainer about black holes"). Go directly to Rule Zero.
When the user provides a video URL or local video file as inspiration — for example:
- "Can you make a video like this?"
- "I love this YouTube Short. Make me something similar."
- "Use this Reel as a reference."
— do not treat this as a generic web-search or prompt-writing request.
This is a first-class workflow in OpenMontage.
- Read:
skills/meta/video-reference-analyst.md - Run the reference analysis workflow using the local analysis tools (
video_analyzer, transcript extraction, scene detection, frame sampling) - Produce a grounded summary of what the reference is doing:
- content
- pacing
- structure
- style
- what makes it work
- Then run normal capability audit and pipeline selection
- Present 2-3 differentiated concepts for the user's version — not a carbon copy
- Reference-driven request: "make me something like this" -> use
video-reference-analyst.md - Source-footage request: "edit this footage" / "cut this into clips" -> use
source_media_reviewand the appropriate footage-led pipeline
If a model misses this distinction, it will often fall back to plain search + guesswork. That is incorrect for OpenMontage.
Every video production request MUST go through the pipeline system. No exceptions.
When the user asks to make, create, produce, or generate any video content — a trailer, explainer, clip, animation, or any other video — the agent must:
- Identify the pipeline. Match the request to one of the pipelines in
pipeline_defs/. If unclear, ask the user. - Read the pipeline manifest.
pipeline_defs/<pipeline>.yaml— know the stages, tools, and quality gates. - Run preflight. Discover available tools via the registry. Present the capability menu.
- Execute stage by stage. For EACH stage, read the stage director skill (
skills/pipelines/<pipeline>/<stage>-director.md) BEFORE doing any work in that stage. - Read Layer 3 skills before calling tools. Before using any tool with an
agent_skillsfield, read the referenced skill in.agents/skills/. These contain provider-specific prompting guidance, parameter optimization, and quality techniques that dramatically improve output.
Do NOT:
- Write ad-hoc Python scripts to call tools directly
- Skip the pipeline and go straight to API calls
- Generate assets without reading the stage director skill first
- Use a tool without checking its Layer 3 skill for prompting guidance
- Bypass preflight, checkpoints, or review
The intelligence is in the skills, not in improvised code. An agent that reads the director skills and Layer 3 knowledge will produce significantly better output than one that calls tools directly with generic prompts.
OpenMontage is an instruction-driven video production system. The AI agent IS the intelligence — it reads instructions (pipeline manifests + stage director skills + meta skills) and drives the pipeline using tools.
Agent reads pipeline manifest (YAML) -> reads stage director skill (MD)
-> uses tools (Python BaseTool subclasses) -> self-reviews (meta skill)
-> checkpoints (Python utility) -> presents to human for approval
Python = tools + persistence. No orchestration logic, creative decisions, review logic, or checkpoint policy in Python code. The agent makes those decisions guided by instructions.
Core loop:
- Select a pipeline.
- Run preflight.
- Discover real tools from the registry.
- Present the user with concepts, tool plan, production plan, and cost.
- Execute stage by stage with checkpoints.
For any meaningful production decision, the agent must communicate the decision before acting. The user should never have to infer which provider, model, or render path was chosen after the fact.
Before any paid or consequential generation call, state:
- the exact tool name,
- the provider,
- the model or provider variant,
- the reason it was chosen,
- whether it is a sample or a batch run.
The agent must ask the user before changing any major production choice, including:
- switching provider,
- switching model family or provider variant,
- switching from video-led to still-led treatment,
- switching composition engine when that changes the output character,
- dropping narration, music, or other approved creative elements,
- changing from sample mode to batch mode.
Minor prompt refinements inside an already approved provider/model path do not require separate approval unless they materially change the creative direction.
When both Remotion and HyperFrames are available on the machine (check video_compose.get_info()["render_engines"]), the agent MUST present both options to the user before locking render_runtime at the proposal stage. The agent MAY recommend one with rationale — but silently picking a "default" is forbidden even when the pipeline manifest or a director skill suggests one.
The presentation MUST include, for each runtime:
- A one-sentence plain-language description of what it is best at for this specific brief.
- A one-sentence honest tradeoff (why it might not be the right pick here).
- The agent's recommendation and the reason, tied to the brief's delivery_promise and visual approach.
Then wait for explicit user approval before advancing. Record the full shortlist — BOTH runtimes plus any "ffmpeg" option that applies — as options_considered in the render_runtime_selection decision logged in decision_log. A decision log entry with only one runtime considered when both were available is a CRITICAL reviewer finding.
Exception: if only one runtime is available on the machine, the agent proceeds with it but MUST say so explicitly ("HyperFrames isn't installed on this machine; I'm proceeding with Remotion. Install HyperFrames if you want the alternative."). The render_runtime_selection decision still records the unavailable option as rejected_because: "runtime not available on this machine".
This rule applies to every pipeline that invokes video_compose — not just Wave 1. A pipeline's director skill may recommend a runtime, but that recommendation is input to the conversation with the user, not a decision.
When a blocker occurs, the agent must surface it immediately using this structure:
- What was attempted
- What failed
- Whether the issue is auth, provider access, tool bug, or prompt/design quality
- What options exist next
- Which option the agent recommends, with reasoning
Do not continue with a substitute path until the user approves.
When asking the user to choose, do not just list options. The agent should:
- provide the shortlist,
- explain the tradeoffs briefly,
- recommend one option,
- wait for approval before proceeding.
If the approved path is blocked, the agent may investigate and prepare alternatives, but may not execute those alternatives without user approval.
This applies especially to:
- provider swaps,
- model swaps,
- fallback tools,
- prompt-only substitutes for reference-driven generation,
- still-image animatics in place of true motion.
The agent itself orchestrates the production state machine:
research -> proposal -> script -> scene_plan -> assets -> edit -> compose
The agent:
- Reads the pipeline manifest (
pipeline_defs/*.yaml) to know the process - Calls
checkpoint.get_next_stage()to find where to resume - Reads the stage's director skill (
skills/pipelines/<pipeline>/<stage>-director.md) to know HOW - Uses tools (
tools/) for concrete capabilities - Self-reviews using the reviewer meta skill (
skills/meta/reviewer.md) - Checkpoints via the checkpoint protocol (
skills/meta/checkpoint-protocol.md) - Presents to human for approval when
human_approval_default: true
Infrastructure files:
lib/checkpoint.py— read/write checkpoints, stage validationtools/cost_tracker.py— budget governancelib/pipeline_loader.py— manifest loading and helpers
Every production run creates a project workspace under projects/. This directory is gitignored — all generated assets are regenerable.
projects/<project-name>/
├── artifacts/ # JSON artifacts from each stage (research_brief, script, scene_plan, etc.)
├── assets/
│ ├── images/ # Generated images (PNG)
│ ├── video/ # Generated video clips (MP4)
│ ├── audio/ # Narration segments + final mix (MP3/WAV)
│ ├── music/ # Background music track (MP3)
│ └── subtitles.srt # Generated subtitles
└── renders/
└── final.mp4 # Final rendered video (the deliverable)
Naming convention: Use kebab-case derived from the video title (e.g., hidden-math-of-nature, how-music-rewires-brain).
Create the project directory at pipeline initialization, before any stage runs. All tools and agents should write outputs to these paths — never to the repo root or ad-hoc locations.
Users can place royalty-free music tracks in music_library/ (gitignored). The asset director will check this folder before falling back to API-based music generation.
music_library/
├── ambient_track.mp3
├── cinematic_epic.mp3
└── ...
If the folder has tracks, the proposal and asset stages should present them as options alongside generated music. See the proposal-director and asset-director skills for details.
| Pipeline | Best For | Stability |
|---|---|---|
animated-explainer |
Topic to fully generated explainer | production |
talking-head |
Footage-led speaker videos | beta |
screen-demo |
Screen recordings and walkthroughs | production |
clip-factory |
Many clips from one long source | beta |
podcast-repurpose |
Podcast highlights and derivatives | beta |
cinematic |
Trailer, teaser, and mood-led edits | production |
animation |
Motion-graphics and animation-first videos | production |
character-animation |
Local rigged cartoon characters and reusable character acting | beta |
hybrid |
Source footage plus support visuals | production |
avatar-spokesperson |
Presenter-led avatar or lip-sync videos | production |
localization-dub |
Subtitle, dub, and translated variants | beta |
framework-smoke |
Test: minimal 2-stage smoke test | test |
Beta pipelines have not been fully audited. They work, but expect rough edges. Mention this when the user selects one.
Do this before any creative work. Use provider_menu_summary() first — it's the human-ready rollup. The raw support_envelope() dump is a firehose (megabytes of JSON on a well-configured machine); pasting it into chat will bury the user.
python -c "
from tools.tool_registry import registry
import json
registry.discover()
print(json.dumps(registry.provider_menu_summary(), indent=2))
"The summary returns four fields the agent should translate into plain language:
composition_runtimes— booleans forffmpeg,remotion,hyperframes. This is the source of truth for the "Present Both Composition Runtimes (HARD RULE)" check.capabilities[]— one entry per capability family withconfigured / totalcounts and provider lists. Ready-made for the "N of M configured" menu.setup_offers[]— unavailable tools whose install is a 1-minute env-var fix. Lead with these when offering upgrades.runtime_warnings[]— specific signals like "hyperframes: npm package not resolvable". Surface these to the user verbatim — they're the kind of silent-failure bugs that break the governance contract.
Then, for deeper inspection (only when the summary isn't enough):
# Full menu — grouped available/unavailable per capability.
python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.provider_menu(), indent=2))"
# Raw envelope — every tool's full contract. Slow/firehose; use for debugging only.
python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.support_envelope(), indent=2))"Then:
- Read the selected manifest in
pipeline_defs/. - Check every
required_toolsentry against the registry. - Check
fallback_toolsfor unavailable tools. - Report one of:
passed,degraded, orblocked. - Do not start production until the user understands the real capability envelope.
Already fetched via provider_menu_summary() above. Read that output and present it to the user as a capability menu, not as a flat tool list. Use provider_menu() directly only when you need the per-tool detail the summary collapses.
How to present:
YOUR CAPABILITIES
Video Generation: 0/13 configured
Image Generation: 1/7 configured
Text-to-Speech: 1/3 configured
Music Generation: 1/1 configured
Composition: 3/3 configured (FFmpeg, video_stitch, video_trimmer)
You can produce videos now with images + TTS + FFmpeg.
Quick upgrades available — see below.
For EACH capability with unavailable providers, read the install_instructions field from the menu output and present setup options grouped by effort:
QUICK SETUP OPTIONS (1-minute each — set an env var in .env)
Video Generation (0/13 -> unlock the biggest upgrade):
Each unavailable provider lists its own install_instructions.
Read them from the provider_menu output and present grouped by env var.
Example: if 3 tools need FAL_KEY, group them: "FAL_KEY unlocks 3 providers"
Image Generation (1/7 -> more style options):
Same pattern — read install_instructions from each unavailable tool.
Text-to-Speech (1/3):
Same pattern.
LOCAL OPTIONS (free, needs hardware):
Tools with runtime=LOCAL or runtime=LOCAL_GPU — read from the menu.
Already Available:
List what's working. The user should feel good about what they have.
Rules:
- Do NOT hardcode provider names, API key names, or setup URLs in your prompts.
Read them from the registry's
install_instructionsfield on each tool. - Always show the ratio: "X of Y configured" — this makes breadth visible.
- Group by capability, not by individual tool.
- Show what they CAN do now, then what they COULD unlock.
- If the user declines setup, proceed with the best available path — no nagging.
- If a tool shares an env var with others, group them (read from
dependenciesfield).
When tools are UNAVAILABLE but can be fixed with simple configuration, offer the user setup help instead of silently working around the limitation. Many tools are one env var away from working.
| Fix Complexity | Action |
|---|---|
| 1-minute fix (env var) | Offer to help configure now — read install_instructions from the tool |
| 5-minute fix (install) | Explain what to install and why — read install_instructions from the tool |
| Complex fix (GPU, model download) | Note the limitation, explain what it would unlock, move on |
Rules:
- Always tell the user what they're missing AND what they'd gain
- Show the cost difference (free local vs. paid API)
- If the user declines setup, proceed with the best available path — no nagging
- Group related fixes (tools sharing the same env var dependency)
video_compose has three render engines / runtimes. They are parallel, not ranked — the choice is made at proposal and locked in edit_decisions.render_runtime. Check which are available:
python -c "
from tools.tool_registry import registry
registry.discover()
info = registry._tools['video_compose'].get_info()
print('Render engines:', info.get('render_engines'))
print('Remotion note:', info.get('remotion_note'))
print('HyperFrames note:', info.get('hyperframes_note'))
"| Engine | Used For | Requires |
|---|---|---|
| FFmpeg | Video-only cuts, concat, trim, subtitle burn | ffmpeg binary (always available) |
| Remotion | React-based composition: still images → animated video, text cards, stat cards, charts, callouts, comparisons, transitions with spring physics, word-level caption burn, TalkingHead avatar | Node.js (npx) + remotion-composer/ + node_modules |
| HyperFrames | HTML/CSS/GSAP composition: kinetic typography, product promos, launch reels, website-to-video, registry-block-driven scenes, SVG character rigs | Node.js ≥ 22 + FFmpeg + npx (consumed via npx hyperframes) |
render_runtime is locked at proposal (proposal_packet.production_plan.render_runtime) and carried through edit_decisions unchanged. video_compose routes based on this field; silent runtime swaps are forbidden. If the chosen runtime becomes unavailable at compose time, surface a structured blocker per "Escalate Blockers Explicitly" above. See skills/core/hyperframes.md for the Remotion-vs-HyperFrames decision matrix.
For any request where the deliverable inherently depends on motion rather than static coverage, treat motion as a hard requirement. Examples:
- sci-fi trailers,
- cinematic teasers built from generated clips,
- hype edits,
- avatar or agent videos,
- any brief whose promise depends on moving shots rather than still frames.
For these requests:
- The
render_runtimechosen at proposal (Remotion, HyperFrames, or FFmpeg) must be confirmed available up front if the planned visual treatment depends on it. - Still-image fallback is forbidden. Do not quietly convert the job into a Ken Burns teaser, animatic, or slide-based video.
- FFmpeg-only fallback is forbidden when it changes the approved deliverable from motion-led video to still-led video.
- Silent runtime swap is forbidden. If
render_runtime="hyperframes"was locked and HyperFrames is unavailable, do NOT route to Remotion instead. Surface the blocker, propose options, get user approval, log arender_runtime_selectiondecision — then proceed. - Bubble critical issues immediately. If the chosen runtime is unavailable, fails to render, or provider clip generation fails in a way that blocks the approved treatment, stop and tell the user before proceeding.
- Do not spend more tokens or time on downgraded output unless the user explicitly approves the downgrade as an animatic or proof-of-concept.
When Remotion is available, the agent should design production plans around it:
- Explainer videos with
flat-motion-graphicsplaybook -> Remotion animated scenes, not Ken Burns - Data-driven videos -> Remotion stat cards and charts, not static image screenshots
- Any pipeline using still images -> Remotion spring animations, not FFmpeg pan-and-zoom
- Screen demos of a CLI/terminal/install flow ->
TerminalScene(synthetic screen recording), not OS-level capture. See.agents/skills/synthetic-screen-recording/SKILL.md. Faster, deterministic, privacy-safe. Use real capture (screen_recorder,cap_recorder,playwright-recording) only when the demo is a real app UI or requires unpredictable live behavior.
See remotion-composer/SCENE_TYPES.md for the authoritative list and their cut schemas. Current scene types usable via cut.type:
text_card, stat_card, callout, comparison, hero_title, terminal_scene, anime_scene, bar_chart, line_chart, pie_chart, kpi_grid, progress_bar. Overlay types include section_title, stat_reveal, hero_title, provider_chip.
When Remotion is NOT available and render_runtime="remotion" was NOT locked, video_compose may use FFmpeg Ken Burns motion on still images. This still works but produces less engaging visuals. Mention this tradeoff in the proposal. When render_runtime="remotion" IS locked and Remotion is unavailable, that's a blocker — escalate, don't silently swap.
When render_runtime="hyperframes" is locked and HyperFrames is unavailable (Node < 22, missing ffmpeg/npx, or hyperframes doctor reports issues), that's also a blocker. Do not substitute Remotion or FFmpeg without user approval + a logged render_runtime_selection decision.
Routing is automatic — video_compose reads edit_decisions.render_runtime and dispatches to the matching engine (_render_via_hyperframes, _remotion_render, or _render_via_ffmpeg). But the agent must know both Remotion and HyperFrames exist at proposal time so it can design the visual approach intentionally. Don't default to Remotion for motion-graphics-heavy concepts that HTML/GSAP would express more naturally, and don't default to HyperFrames for briefs that reuse the existing React scene stack.
OpenMontage uses two layers for capability choice:
- selector tools: capability-level routing such as
tts_selectorandvideo_selector - provider tools: concrete tools discovered via the registry that call a specific backend
Always inspect the registry first:
python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.capability_catalog(), indent=2))"
python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.provider_catalog(), indent=2))"For finalist tools inspect:
capabilityproviderusage_locationsupportsfallback_toolsrelated_skills
Do not rely on memory or old docs when the registry can answer it.
Do not maintain hardcoded tool lists. Always query the registry at runtime:
# See all tools grouped by capability (TTS, video_generation, image_generation, etc.)
python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.capability_catalog(), indent=2))"
# See all tools grouped by provider (elevenlabs, openai, ffmpeg, etc.)
python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.provider_catalog(), indent=2))"Key capability families to look for in the output:
- tts — Text-to-speech providers. Route via
tts_selector. - video_generation — Video generation providers (cloud, local GPU, stock). Route via
video_selector. - image_generation — Image generation providers (cloud, local GPU, stock). Route via
image_selector. - music_generation — Music and sound effect generation.
- video_post — Composition, stitching, trimming (FFmpeg-based, always local).
- audio_processing — Mixing, enhancement (FFmpeg-based, always local).
- analysis — Transcription, scene detection, frame sampling.
- avatar — Talking head and lip sync generation.
- character_animation — Local character specs, SVG rigs, pose libraries, action timelines, previews, and QA.
- enhancement — Upscale, background removal, face enhance, color grading.
Each tool in the registry declares best_for, install_instructions, runtime (LOCAL, API, LOCAL_GPU, HYBRID), and status. Read these fields — do not assume tool strengths from memory.
All tool classes use PascalCase without a "Tool" suffix. When importing tools in Python:
| Module | Class Name | NOT |
|---|---|---|
tools.audio.music_gen |
MusicGen |
|
tools.video.video_compose |
VideoCompose |
|
tools.audio.audio_mixer |
AudioMixer |
|
tools.tts.elevenlabs_tts |
ElevenLabsTTS |
|
tools.analysis.transcriber |
Transcriber |
|
tools.subtitle.subtitle_gen |
SubtitleGen |
When in doubt, check: grep "^class " tools/<path>.py
All tools call via .execute(params_dict) (returns ToolResult with .success, .data, .error), NOT .run().
Three selector tools abstract multi-provider capabilities. Selectors auto-discover providers from the registry. Adding a new provider tool automatically makes it available through the selector — no selector code changes needed.
| Selector | Routes to | How it discovers |
|---|---|---|
tts_selector |
All tools with capability="tts" (ElevenLabs, Google TTS, OpenAI, Piper) |
registry.get_by_capability("tts") |
image_selector |
All tools with capability="image_generation" (FLUX, Google Imagen, DALL-E, Recraft, etc.) |
registry.get_by_capability("image_generation") |
video_selector |
All tools with capability="video_generation" |
registry.get_by_capability("video_generation") |
Selectors route based on: user preference > availability > discovery order. They adapt input schemas between providers transparently.
Before committing to execution, present:
4-5concept directions when the brief is still open.- Recommended pipeline.
- Recommended tool path.
- Alternative tool paths that are actually available.
- Cost estimate and quality tradeoffs.
- Music plan — mandatory for every pipeline that has audio. See below.
- Production plan by stage.
- Approval gate before asset generation.
If a user prefers a specific vendor and that tool is available, surface it directly. Do not hide provider choice.
Music is a critical part of any video. Surface the music situation to the user at proposal/idea time — do not silently defer it to the asset stage where a failure becomes expensive.
Check music availability in this order and present the options:
- User music library (
music_library/): Check if this folder exists and contains tracks. If so, list available tracks with durations and let the user pick one. - Music generation APIs: Check which music tools are available via the registry (
registry.get_by_capability("music_generation")). Report their status honestly — include quota status if known. - Royalty-free sources: Note if the user can provide their own track (e.g., from YouTube Audio Library, Jamendo, or other free sources). Offer the
music_library/drop path.
Always present the user with explicit choices:
- Use a track from their library (which one?)
- Provide a different track (drop it in
music_library/) - Generate one via API (if available — name the provider and cost)
- Proceed without music
If no music source is available: Tell the user explicitly. Do NOT let this surface as a surprise at the asset stage.
Record the music decision in the proposal/brief artifact so the asset director knows what to do.
Each pipeline manifest's tools_available field declares what tools a stage can use. Use selectors for multi-provider capabilities — the selector handles routing to whatever is available. Read the pipeline manifest for the authoritative list per stage.
Each stage produces one canonical artifact that becomes the contract for the next stage. The stage director skill teaches the agent HOW to produce it.
| Stage | Director Skill | Canonical output | Core quality bar |
|---|---|---|---|
idea |
*-director.md |
brief |
Clear hook, target platform, duration, tone, and user intent |
script |
*-director.md |
script |
Structured sections, valid timing, coherent narration |
scene_plan |
*-director.md |
scene_plan |
Ordered scenes, timings, asset requirements |
assets |
*-director.md |
asset_manifest |
Provenance, paths, model/tool metadata, scene linkage |
edit |
*-director.md |
edit_decisions |
Concrete cuts, overlays, subtitle/music decisions |
compose |
*-director.md |
render_report |
Output paths, encoding profile, verification notes |
Stage contract rules:
- A completed or awaiting-human checkpoint must include the stage's canonical artifact.
- Canonical artifacts must validate against the JSON schema in
schemas/artifacts/. - Non-canonical outputs such as media files belong in stage-specific directories.
- Tools should record seeds/model versions for reproducibility.
The reviewer is a meta skill (skills/meta/reviewer.md) — advisory, never directly blocks progression.
- Self-review after every stage execution, before checkpointing.
- Load
review_focusitems from the pipeline manifest for the current stage. - Maximum two review rounds. After that, pass with warnings and move on.
- Findings categorized: critical (must fix), suggestion (should fix), nitpick (nice-to-have).
- Critical findings -> fix and re-review. Suggestions -> note and proceed.
- Check playbook
quality_rulesas constraints, not suggestions.
The checkpoint protocol meta skill (skills/meta/checkpoint-protocol.md) teaches the agent when to pause:
- Read
human_approval_defaultfrom the pipeline manifest per stage - Creative stages (
idea,script,scene_plan) typically require approval - Technical stages (
assets,edit,compose) typically auto-proceed - When approval is required: present artifact summary, review findings, and cost snapshot
- Wait for human to approve, request revision, or abort
Agents coordinate through canonical JSON artifacts, checkpoints, pipeline manifests, and the tool registry.
Primary files:
- Artifact schemas:
schemas/artifacts/ - Checkpoint schema:
schemas/checkpoints/checkpoint.schema.json - Pipeline manifest schema:
schemas/pipelines/pipeline_manifest.schema.json - Pipeline manifests:
pipeline_defs/ - Style playbooks:
styles/*.yaml(validated byschemas/styles/playbook.schema.json) - Tool contract:
tools/base_tool.py - Tool registry:
tools/tool_registry.py - Stage director skills:
skills/pipelines/<pipeline>/<stage>-director.md - Meta skills:
skills/meta/*.md
Checkpoint rules:
- Checkpoints live at
pipelines/<project_id>/checkpoint_<stage>.json. statusmay becompleted,failed,awaiting_human, orin_progress.completedandawaiting_humancheckpoints must include the canonical artifact.- Invalid checkpoints or invalid canonical artifacts are contract violations and should fail fast.
Pipeline manifest rules:
- Pipelines are declarative YAML manifests in
pipeline_defs/. - Stages declare:
skill(director skill path),produces,tools_available,review_focus,success_criteria,human_approval_default. - Adding a new pipeline requires a manifest + stage director skills.
Tool rules:
- Every production tool must inherit from
BaseTool. - Tool discovery flows through the registry, not ad hoc imports.
- Support-envelope reporting is the source of truth for capability, status, and resource requirements.
| Playbook | Best For |
|---|---|
clean-professional |
Corporate, educational, SaaS |
flat-motion-graphics |
Social media, TikTok, startups |
minimalist-diagram |
Technical deep-dives, architecture |
OpenMontage has three instruction layers:
tools/What exists, what is available, cost, runtime, fallback, related skills.skills/How OpenMontage wants those tools used in pipelines..agents/skills/Raw vendor or technology knowledge.
Reading order:
- registry / tool contract — discover what's available
- relevant pipeline or creative skill (Layer 2) — know HOW to use it in this context
- underlying vendor skill (Layer 3) — mandatory before calling any generation tool
Prefer skills over source code for tool usage. Skills exist precisely so you don't need implementation details in the common case. Layer 2 tells you what and when. Layer 3 tells you how. For authoring prompts, choosing parameters, or understanding usage patterns, you should be reading skills — not .py files.
Exception: debugging, audits, and verifying the governance contract. When a skill and a tool disagree, or when something behaves differently than the skill claims, reading the tool source is fair game — that's often the only way to catch a silent-availability bug or a stale doc string. An audit that refuses to look at the implementation will miss exactly the bugs that matter most. If you do read source to debug, consider whether the finding belongs in a skill update afterward so the next agent doesn't need to repeat the dive.
Layer 3 is not optional. Every generation tool (video, image, TTS, music) has an agent_skills field listing its Layer 3 skills. These skills contain provider-specific prompt engineering, parameter tuning, and quality techniques. Read them before writing prompts. The difference between a generic prompt and a skill-informed prompt is the difference between "usable" and "cinematic."
Example: Before calling kling_video, read its agent_skills → ai-video-gen → get Kling-specific prompt structure, camera direction syntax, and quality keywords that the model responds to best.
The .agents/skills/ directory is large. When you're not coming in through a tool's agent_skills pointer, use this table to find the right file by what you're trying to do:
| Category | Skills |
|---|---|
| Composition runtime | remotion, remotion-best-practices, synthetic-screen-recording (fake terminal/UI demos via Remotion TerminalScene) |
| Animation knowledge (generic) | gsap-core, gsap-timeline, gsap-plugins (SplitText / MorphSVG / DrawSVG / MotionPath / Flip / CustomEase), gsap-utils, gsap-react, gsap-performance, gsap-scrolltrigger, gsap-frameworks, framer-motion (Disney 12 principles), lottie-bodymovin (Lottie export) |
| Character animation | character-rigging, svg-character-animation, pose-library-design, canvas-procedural-animation, character-animation-qa |
| Image generation | bfl-api, flux-best-practices |
| Video generation | seedance-2-0 (preferred premium default — cinematic, trailer, multi-shot, synced audio, lip-sync), ai-video-gen, ltx2 |
| Audio | elevenlabs, music, sound-effects, acestep, text-to-speech, setup-api-key |
| Avatar / lip-sync | avatar-video, heygen, create-video, faceswap, video-translate, speech-to-text, agents |
| Capture | playwright-recording (browser flows), ffmpeg (post) |
| Visualization | beautiful-mermaid, d3-viz, manim-composer, manimce-best-practices, manimgl-best-practices |
| Media editing | video-edit, video-download, video-understand, video_toolkit, visual-style |
When in doubt, read the category's meta routing file first:
- Picking an animation runtime? →
skills/meta/animation-runtime-selector.mdroutes between Remotion primitives, GSAP plugins, framer-motion, Lottie, Manim, D3. - Picking a screen-recording mode (real capture vs synthetic terminal)? →
pipeline_defs/screen-demo.yaml+skills/pipelines/screen-demo/idea-director.md.
| Question | Where to look |
|---|---|
| What tools exist? | tools/tool_registry.py and registry.support_envelope() |
| What providers are available for a capability? | registry.capability_catalog() |
| What tools exist for a vendor? | registry.provider_catalog() |
| How does a tool actually work? | the tool's usage_location from the registry |
| How should this pipeline stage behave? | skills/pipelines/<pipeline>/... |
| What is the checkpoint/review policy? | skills/meta/ |
- Do not bypass the pipeline. Never write ad-hoc scripts to call tools directly. All production goes through pipeline stages with director skills. See Rule Zero.
- Do not call generation tools without reading their Layer 3 skill. Check the tool's
agent_skillsfield, read the referenced skill, then craft your prompts using that guidance. - Do not skip stage director skills. Before executing any pipeline stage, read its director skill. The skill contains the quality bar, the workflow, and the review criteria.
- Do not use deleted legacy names such as
tts_cloud,tts_engine, orvideo_gen. - Do not hardcode provider names, API key names, or setup URLs. Read them from the registry's
install_instructionsanddependenciesfields. - Do not begin asset generation before user approval on the production plan.
- Do not hide degraded paths. Record substitutions and blocked options explicitly.
- Do not present a single unavailable tool in isolation. Always show the full capability picture: "X of Y providers configured for this capability."
- Do not skip the Provider Menu at preflight. The user must see what they have AND what they could unlock.
- Do not change provider, model, or render path without telling the user first and getting approval when the change is material.