Summary
Integrate Promptfoo's policy plugin as an optional red-teaming source in the Smith test generation pipeline. The plugin generates disallow and malicious test cases scoped to specific user profiles and policy constraints. This is complementary to Smith's core test generation pipeline.
Motivation
Traditional red-teaming tools focus on attacking at the guardrail/prompt level — testing whether the LLM can be jailbroken or tricked into harmful outputs. However, they do not consider system variables, or structured policy rules. Promptfoo's policy plugin fills this gap by generating adversarial test cases that target policy boundaries based on user contexts.
However, Smith's generation is more automatic — it decomposes each user context by mutating system variables without manual configuration. Promptfoo, on the other hand, requires dedicated design of specific user contexts, careful crafting of testGenerationInstructions, and a separate configuration file. This makes it more effort-intensive and requires additional user knowledge. We will include several examples of these configurations to lower the barrier, but Promptfoo remains optional for teams willing to invest the extra setup.
Smith's core pipeline and Promptfoo's policy plugin both generate adversarial test cases against policy rules, but from different angles. Smith decomposes guidance into structured conditions and generates cases systematically, while Promptfoo generates attack prompts based on specific user context and policy text. Having both may provide broader coverage.
This integration is optional. The core Smith pipeline works without Promptfoo. Teams that have Promptfoo installed (npm install -g promptfoo) can enable it for additional disallow/malicious test coverage.
Design Considerations
Configuration File
Promptfoo requires a dedicated configuration file (promptfooconfig.yaml) per target agent. This file must be manually authored and defines:
purpose: Agent description
vars: Top-level system variables
contexts: User profiles to generate attacks for (id, purpose, vars)
provider: LLM for generation (e.g., ollama:chat:qwen3.5:latest)
plugins.policy: The policy text to attack against
testGenerationInstructions: Constraints on generation quality
strategies: Attack strategies
Variable Format Translation
Promptfoo only supports string variables in its output. Smith's policy engine requires typed variables (lists, booleans, integers). Smith handles this translation in convert_test_case.py using system_vars.json as the type reference:
- Promptfoo outputs:
roles: "employee", approval: "false", queries_this_session: "5"
- Smith converts to:
roles: ["employee"], approval: false, queries_this_session: 5
Conversion rules:
- String → list (if reference is a list)
- String → int/float (if reference is numeric)
- String
"true"/"false" → boolean (applied after type conversion)
Output
All generated cases are labeled promptfoo_malicious and placed in the promptfoo_malicious/ test directory.
Current State
Open Work
Known Limitations
- Mislabeled cases: The plugin may generate prompts for actions that are actually legal for the given context. Error rates vary (2–36% observed). Roles with broad permissions (e.g., "analyst" with full access) are most affected.
- String-only variables: Promptfoo does not support typed variables natively, requiring Smith to perform post-processing conversion.
- Manual configuration:
promptfooconfig.yaml must currently be authored by hand. Auto-generation from existing Smith inputs is planned as a future feature.
- Provider dependency: Requires a running LLM (local ollama or Promptfoo cloud account) for generation.
Summary
Integrate Promptfoo's policy plugin as an optional red-teaming source in the Smith test generation pipeline. The plugin generates disallow and malicious test cases scoped to specific user profiles and policy constraints. This is complementary to Smith's core test generation pipeline.
Motivation
Traditional red-teaming tools focus on attacking at the guardrail/prompt level — testing whether the LLM can be jailbroken or tricked into harmful outputs. However, they do not consider system variables, or structured policy rules. Promptfoo's policy plugin fills this gap by generating adversarial test cases that target policy boundaries based on user contexts.
However, Smith's generation is more automatic — it decomposes each user context by mutating system variables without manual configuration. Promptfoo, on the other hand, requires dedicated design of specific user contexts, careful crafting of
testGenerationInstructions, and a separate configuration file. This makes it more effort-intensive and requires additional user knowledge. We will include several examples of these configurations to lower the barrier, but Promptfoo remains optional for teams willing to invest the extra setup.Smith's core pipeline and Promptfoo's policy plugin both generate adversarial test cases against policy rules, but from different angles. Smith decomposes guidance into structured conditions and generates cases systematically, while Promptfoo generates attack prompts based on specific user context and policy text. Having both may provide broader coverage.
This integration is optional. The core Smith pipeline works without Promptfoo. Teams that have Promptfoo installed (
npm install -g promptfoo) can enable it for additional disallow/malicious test coverage.Design Considerations
Configuration File
Promptfoo requires a dedicated configuration file (
promptfooconfig.yaml) per target agent. This file must be manually authored and defines:purpose: Agent descriptionvars: Top-level system variablescontexts: User profiles to generate attacks for (id, purpose, vars)provider: LLM for generation (e.g.,ollama:chat:qwen3.5:latest)plugins.policy: The policy text to attack againsttestGenerationInstructions: Constraints on generation qualitystrategies: Attack strategiesVariable Format Translation
Promptfoo only supports string variables in its output. Smith's policy engine requires typed variables (lists, booleans, integers). Smith handles this translation in
convert_test_case.pyusingsystem_vars.jsonas the type reference:roles: "employee",approval: "false",queries_this_session: "5"roles: ["employee"],approval: false,queries_this_session: 5Conversion rules:
"true"/"false"→ boolean (applied after type conversion)Output
All generated cases are labeled
promptfoo_maliciousand placed in thepromptfoo_malicious/test directory.Current State
promptfooconfig.yamltemplates created for HR Agent, Call-for-Papers, Car Price examplesattack_promptfoo.pyinvokes Promptfoo and extracts cases with system variablesconvert_test_case.pyhandles type conversion from Promptfoo's string-only formatOLLAMA_BASE_URLconfiguration (Promptfoo uses native ollama API at/api/chat, not the OpenAI-compatible/v1endpoint)Open Work
.env— currently runs by default during test generationpromptfooconfig.yamlauthoring (context design, provider setup, variable mapping)promptfooconfig.yamlfromguidance.txtandsystem_vars.jsonto reduce manual effortKnown Limitations
promptfooconfig.yamlmust currently be authored by hand. Auto-generation from existing Smith inputs is planned as a future feature.